Appendix H — Activities

There are various ways that I have found to use this book in classes. There does not seem to be one best way, but it seems to depend on enthusiasm and size of the group. If they are able to commit to reading the chapter before the lecture, then I have found using class for group-based projects and discussion is nice. Each week create small groups, each of two to four students (create new groups every week to give students a chance to work with new people). Then following a “think-pair-share” exercise (Lyman 1981) have them work through most exercises, first by themselves, then compare with their group, and finally share selected answers with the class. I recommend creating a Google Doc and using that in places to make it easier to share.

In terms of timing and coverage, I have found that provided that Part I “Foundations” is covered, then the rest of the chapters are fairly independent. While I try to go with one chapter per week, students sometimes take a while to get started, and the first three chapters takes about three weeks (even though there is not much to the first chapter).

Typically, somewhere between the first and second papers is where it all starts to come together. It is important that Paper 1 is returned quickly to them so that they are able to incorporate lessons from that for future papers.

Chapter 1—Telling stories with data

Chapter 2—Drinking from a fire hose

Chapter 3—Reproducible workflows

Chapter 4—Writing research

Chapter 5—Static communication

library(tidyverse)
tibble(year = 1875:1972,
       level = as.numeric(datasets::LakeHuron)) |>
  ggplot(aes(x = year, y = level)) +
  geom_point()

datasets::trees |> 
  as_tibble() |> 
  ggplot(aes(x = Height)) +
  geom_bar()

datasets::ChickWeight |> 
  as_tibble() |> 
  ggplot(aes(x = Time, y = weight, group = Chick)) +
  geom_line()

tibble(year = 1700:1988,
       sunspots = as.numeric(datasets::sunspot.year) |> round(0)) |>
  ggplot(aes(x = sunspots)) +
  geom_histogram()

library(palmerpenguins)

ggplot(data = penguins,
       aes(x = flipper_length_mm,
           y = body_mass_g)) +
  geom_point(aes(color = species,
                 shape = species),
             size = 3,
             alpha = 0.8) +
  scale_color_manual(values = c("darkorange", "purple", "cyan4")) +
  labs(
    title = "Penguin size, Palmer Station LTER",
    subtitle = "Flipper length and body mass for Adelie, Chinstrap and Gentoo Penguins",
    x = "Flipper length (mm)",
    y = "Body mass (g)",
    color = "Penguin species",
    shape = "Penguin species"
  ) +
  theme_minimal() +
  theme(
    legend.position = c(0.2, 0.7),
    plot.title.position = "plot",
    plot.caption = element_text(hjust = 0, face = "italic"),
    plot.caption.position = "plot"
  )

datasets::morley |> 
  tibble()
# A tibble: 100 × 3
    Expt   Run Speed
   <int> <int> <int>
 1     1     1   850
 2     1     2   740
 3     1     3   900
 4     1     4  1070
 5     1     5   930
 6     1     6   850
 7     1     7   950
 8     1     8   980
 9     1     9   980
10     1    10   880
# … with 90 more rows

Chapter 6—Farm data

Chapter 7—Gather data

Chapter 8—Hunt data

Chapter 9—Clean and prepare

Chapter 10—Store and share

Chapter 11—Exploratory data analysis

Chapter 12—Linear models

Chapter 13—Generalized linear models

Chapter 14—Causality from observational data

Chapter 15—Multilevel regression with post-stratification

Chapter 16—Text as data

Appendix A—R essentials


  1. Note for instructors: There are always a small number of students who struggle with getting the PDF set-up locally. Worst case, they can run it in Posit Cloud, which works.↩︎

  2. Note for instructors: 1) If you have hidden your GitHub email then make sure you use the alias when you add an email address locally. 2) There will always be a few students that cannot get git working locally, and I find the best approach is to triage by pairing them with an advanced student while you demonstrate, and if there are remaining issues then deal with them individually at an office hour.↩︎