4 Writing research

Chapman and Hall/CRC published this book in July 2023. You can purchase that here. This online version has some updates to what was printed.

Prerequisites

Read By Design: Planning Research on Higher Education, (Light, Singer, and Willett 1990)
- Focus on Chapter 2 “What are your questions”, which provides strategies for developing good research questions.
Read On Writing Well, (any edition is fine) (Zinsser 1976)
- Focus on Parts I “Principles”, and II “Methods”, which provide a “how-to” for a particularly effective style of writing.
Read Novelist Cormac McCarthy’s tips on how to write a great science paper, (Savage and Yeh 2019)
- This paper provides specific tips that will improve your writing.
Read Publication, publication, (G. King 2006)
- This paper details a strategy for moving from a replication to a publishable academic paper.
Watch Quantitative Editing, (Bronner 2021)
- The video provides strategies for quantitative-based writing based on experience as a quantitative editor at FiveThirtyEight.
Read Smoking and carcinoma of the lung, (Doll and Hill 1950)
- The paper provides an excellent example of a data section.
Read How to write usefully (Graham 2020)
- A blog post about writing something true and important that the reader did not already know.
Read one of the following well-written quantitative papers:
- Asset prices in an exchange economy, (Lucas 1978)
- Individuals, institutions, and innovation in the debates of the French Revolution, (Barron et al. 2018)
- Modeling: optimal marathon performance on the basis of physiological factors, (Joyner 1991)
- On reproducible econometric research, (Koenker and Zeileis 2009)
- Prevented mortality and greenhouse gas emissions from historical and projected nuclear power, (Kharecha and Hansen 2013)
- Seeing like a market, (Fourcade and Healy 2017)
- Simpson’s paradox and the hot hand in basketball, (Wardrop 1995)
- Some studies in machine learning using the game of checkers, (Samuel 1959)
- Statistical methods for assessing agreement between two methods of clinical measurement, (Bland and Altman 1986)
- Surgical Skill and Complication Rates after Bariatric Surgery, (Birkmeyer et al. 2013)
- The mundanity of excellence: An ethnographic report on stratification and Olympic swimmers, (Chambliss 1989)
- The probable error of a mean, (Student 1908)
Read one of the following articles from The New Yorker:
- Funny Like a Guy, Tad Friend, 4 April 2011
- Going the Distance, David Remnick, 19 January 2014
- How the First Gravitational Waves Were Found, Nicola Twilley, 11 February 2016
- Happy Feet, Alexandra Jacobs, 7 September 2009
- Levels of the Game, John McPhee, 31 May 1969
- Reporting from Hiroshima, John Hersey, 23 August 1946
- The Catastrophist, Elizabeth Kolbert, 22 June 2009
- The Quiet German, George Packer, 24 November 2014
- The Pursuit of Beauty, Alec Wilkinson, 1 February 2015
Read one of the following articles from other publications:
- Blades of Glory, Holly Anderson, Grantland
- Born to Run, Walt Harrington, The Washington Post
- Dropped, Jason Fagone, Grantland
- Federer as Religious Experience, David Foster Wallace, The New York Times Magazine
- Generation Why?, Zadie Smith, The New York Review of Books
- One hundred years of arm bars, David Samuels, Grantland
- Out in the Great Alone, Brian Phillips, ESPN
- Pearls Before Breakfast, Gene Weingarten, The Washington Post
- Resurrecting The Champ, J.R. Moehringer, Los Angeles Times
- The Cult of “Jurassic Park”, Bryan Curtis, Grantland
- The House that Hova Built, Zadie Smith, The New York Times
- The Re-Education of Chris Copeland, Flinder Boyd, SB Nation
- The Sea of Crisis, Brian Phillips, Grantland
- The Webb Space Telescope Will Rewrite Cosmic History. If It Works., Natalie Wolchover, Quanta Magazine

Key concepts and skills

Writing is a critical skill—perhaps the most important—of all the skills required to analyze data. The only way to get better at writing is to write, ideally every day.
When we write, although the benefits typically accrue to ourselves, we must nonetheless write for the reader. This means having one main message that we want to communicate, and thinking about where they are, rather than where we are.
We want to get to a first draft as quickly as possible. Even if it is horrible, the difference between a first draft existing and not is enormous. At that point we start to rewrite. When doing so we aim to maximize clarity, often by removing unnecessary words.
We typically begin with some area of interest and then develop research questions, datasets, and analysis in an iterative way. Through this process we come to a better understanding of what we are doing.

Software and packages

knitr (Xie 2023)
tidyverse (Wickham et al. 2019)
tinytable (Arel-Bundock 2024)

library(knitr)
library(tidyverse)
library(tinytable)

4.1 Introduction

If you want to be a writer, you must do two things above all others: read a lot and write a lot. There’s no way around these two things that I’m aware of, no shortcut.

S. King (2000, 145)

We predominately tell stories with data by writing them down. Writing allows us to communicate efficiently. It is also a way to work out what we believe and allows us to get feedback on our ideas. Effective papers are tightly written and well-organized, which makes their story flow well. Proper sentence structure, spelling, vocabulary, and grammar are important because they remove distractions and enable each aspect of the story to be clearly articulated.

This chapter is about writing. By the end of it, you will have a better idea of how to write short, detailed, quantitative papers that communicate what you want them to, and do not waste the reader’s time. We write for the reader, not for ourselves. Specifically, we write to be useful to the reader. This means clearly communicating something new, true, and important (Graham 2020). That said, the greatest benefit of writing nonetheless often accrues to the writer, even when we write for our audience. This is because the process of writing is a way to work out what we think and how we came to believe it.

Aspects of this chapter can feel a little like a list. It may be that you go through those aspects quickly initially, and then return to them as needed.

4.2 Writing

The way to do a piece of writing is three or four times over, never once. For me, the hardest part comes first, getting something—anything—out in front of me. Sometimes in a nervous frenzy I just fling words as if I were flinging mud at a wall. Blurt out, heave out, babble out something—anything—as a first draft.

McPhee (2017, 159)

The process of writing is a process of rewriting. The critical task is to get to a first draft as quickly as possible. Until that complete first draft exists, it is useful to try to not to delete, or even revise, anything that was written, regardless of how bad it may seem. Just write. (This advice is directed at less-experienced writers. As you get more experience, you may find that your approach changes.)

One of the most intimidating stages is a blank page, and we deal with this by immediately adding headings such as: “Introduction”, “Data”, “Model”, “Results”, and “Discussion”. And then adding fields in the top matter for the various bits and pieces that are needed, such as “title”, “date”, “author”, and “abstract”. This creates a generic outline, which will play the role of mise en place for the paper. By way of background, mise en place is a preparatory phase in a professional kitchen when ingredients are sorted, prepared, and arranged for easy access. This ensures that everything that is needed is available without unnecessary delay. Putting together an outline plays the same role when writing quantitative papers, and is akin to placing on the counter, the ingredients that we will use to prepare dinner (McPhee 2017).

Having established this generic outline, we need to develop an understanding of what we are exploring through thinking deeply about our research question. In theory, we develop a research question, answer it, and then do all the writing; but that rarely actually happens (Franklin 2005). Instead, we typically have some idea of the question and the shape of an answer, and these become less vague as we write. This is because it is through the process of writing that we refine our thinking (S. King 2000, 131). Having put down some thoughts about the research question, we can start to add dot points in each of the sections, adding sub-sections with informative sub-headings as needed. We then go back and expand those dot points into paragraphs. While we do this our thinking is influenced by a web of other researchers, but also other aspects such as our circumstances and environment (Latour 1996).

While writing the first draft you should ignore the feeling that you are not good enough, or that it is impossible. Just write. You need words on paper, even if they are bad, and the first draft is when you accomplish this. Remove distractions and focus on writing. Perfectionism is the enemy, and should be set aside. Sometimes this can be accomplished by getting up very early to write, by creating a deadline, or forming a writing group. Creating a sense of urgency can be useful and one option is to not bother with adding proper citations as you go, which could slow you down, and instead just add something like “[TODO: CITE R HERE]”. Do similar with graphs and tables. That is, include textual descriptions such as “[TODO: ADD GRAPH THAT SHOWS EACH COUNTRY OVER TIME HERE]” instead of actual graphs and tables. Focus on adding content, even if it is bad. When this is all done, a first draft exists.

This first draft will be poorly written and far from great. But it is by writing a bad first draft that you can get to a good second draft, a great third draft, and eventually excellence (Lamott 1994, 20). That first draft will be too long, it will not make sense, it will contain claims that cannot be supported, and some claims that should not be. If you are not embarrassed by your first draft, then you have not written it quickly enough.

Use the “delete” key extensively, as well as “cut” and “paste”, to turn that first draft into a second. Print the draft and using a red pen to move or remove words, sentences, and entire paragraphs, is especially helpful. The process of going from a first draft to a second draft is best done in one sitting, to help with the flow and consistency of the story. One aspect of this first rewrite is enhancing the story that we want to tell. Another aspect is taking out everything that is not the story (S. King 2000, 57).

It can be painful to remove work that seems good even if it does not quite fit into what the draft is becoming. One way to make this less painful is to make a temporary document, perhaps named “debris.qmd”, to save these unwanted paragraphs instead of immediately deleting them. Another strategy is to comment out the paragraphs. That way you can still look at the raw file and notice aspects that could be useful.

As you go through what was written in each of the sections try to bring some sense to it with special consideration to how it supports the story that is developing. This revision process is the essence of writing (McPhee 2017, 160). You should also fix the references, and add the real graphs and tables. As part of this rewriting process, the paper’s central message tends to develop, and the answers to the research questions tend to become clearer. At this point, aspects such as the introduction can be returned to and, finally, the abstract. Typos and other issues affect the credibility of the work. So these should be fixed as part of the second draft.

At this point the draft is starting to become sensible. The job is to now make it brilliant. Print it and again go through it on paper. Try to remove everything that does not contribute to the story. At about this stage, you may start to get too close to the paper. This is a great opportunity to give it to someone else for their comments. Ask for feedback about what is weak about the story. After addressing these, it can be helpful to go through the paper once more, this time reading it aloud. A paper is never “done” and it is more that at a certain point you either run out of time or become sick of the sight of it.

4.3 Asking questions

Both qualitative and quantitative approaches have their place. In this book we focus on quantitative approaches. Nonetheless qualitative research is important, and often the most interesting work has a little of both. When conducting quantitative analysis, we are subject to issues such as data quality, measurement, and relevance. We are often especially interested in trying to tease out causality. Regardless, we are trying to learn something about the world. Our research questions need to take this all into account.

Broadly, and at the risk of over-simplification, there are two ways to go about research:

data-first; or
question-first.

But it is not a binary, and often research proceeds by iterating between data and questions, organized around a research puzzle (Gustafsson and Hagström 2017). Light, Singer, and Willett (1990, 39) describe this approach as a spiral of \(\mbox{theory}\rightarrow\mbox{data}\rightarrow\mbox{theory}\rightarrow\mbox{data}\), etc. For instance, a question-first approach could be theory-driven or data-driven, as could a data-first approach. An alternative framing is to compare an inductive, or specific-to-general, approach with a deductive, or general-to-specific, approach to research.

Consider two examples:

Mok et al. (2022) examine eight billion unique listening events from 100,000 Spotify users to understand how users explore content. They find a clear relationship between age and behavior, with younger users exploring unknown content less than older users, despite having more diverse consumption. While it is clear that research questions around discovery and exploration drive this paper, it would not have been possible without access to this dataset. There likely would have been an iterative process where potential research questions and potential datasets were considered, before the ultimate match.
Think of wanting to explore the neonatal mortality rate (NMR), which was introduced in Chapter 2. One might be interested in what NMR could look like in Sub-Saharan Africa in 20 years. This would be question-first. But within this, there could be: theory-driven aspects, such as what do we expect based on biological relationships with other quantities; or data-driven aspects such as collecting as much data as possible to make forecasts. An alternative, purely data-driven approach would be having access to the NMR and then working out what is possible.

4.3.1 Data-first

When being data-first, the main issue is working out the questions that can be reasonably answered with the available data. When deciding what these are, it is useful to consider:

Theory: Is there a reasonable expectation that there is something causal that could be determined? For instance, Mark Christensen used to joke that if the question involved charting the stock market, then it might be better to hark back to The Odyssey and read bull entrails on a fire, because at least that way you would have something to eat at the end of the day. Questions usually need to have some plausible theoretical underpinning to help avoid spurious relationships. One way to develop theory, given data, is to consider “of what is this an instance?” (Rosenau 1999, 7). Following that approach, one tries to generalize beyond the specific setting. For instance, thinking of some particular civil war as an instance of all civil wars. The benefit of this is it focuses attention on the general attributes needed for building theory.
Importance: There are plenty of trivial questions that can be answered, but it is important to not waste our time or that of the reader. Having an important question can also help with motivation when we find ourselves in, say, the fourth straight week of cleaning data and debugging code. In industry it can also make it easier to attract talented employees and funding. That said, a balance is needed; the question needs to have a decent chance of being answered. Attacking a generation-defining question might be best broken into chunks.
Availability: Is there a reasonable expectation of additional data being available in the future? This could allow us to answer related questions and turn one paper into a research agenda.
Iteration: Is this something that could be run multiple times, or is it a once-off analysis? If it is the former, then it becomes possible to start answering specific research questions and then iterate. But if we can only get access to the data once then we need to think about broader questions.

There is a saying, sometimes attributed to Xiao-Li Meng, that all of statistics is a missing data problem. And so paradoxically, another way to ask data-first questions is to think about the data we do not have. For instance, returning to the neonatal and maternal mortality examples discussed earlier one problem is that we do not have complete cause of death data. If we did, then we could count the number of relevant deaths. (Castro et al. (2023) remind us that this simplistic hypothetical would be complicated in reality because there are sometimes causes of death that are not independent of other causes.) Having established some missing data problem, we can take a data-driven approach. We look at the data we do have, and then ask research questions that speak to the extent that we can use that to approximate our hypothetical dataset.

Shoulders of giants

Xiao-Li Meng is the Whipple V. N. Jones Professor of Statistics at Harvard University. After earning a PhD in Statistics from Harvard University in 1990 he was appointed as an assistant professor at the University of Chicago where he was promoted to professor in 2000. He moved to Harvard in 2001, serving as chair of the statistics department between 2004 and 2012. He has published on a wide range of topics including missing data—Meng (1994) and Meng (2012)—and data quality—Meng (2018). He was awarded the COPSS Presidents’ Award in 2001.

One way that some researchers are data-first is that they develop a particular expertise in the data of some geographical or historical circumstance. For instance, they may be especially knowledgeable about, say, the present-day United Kingdom, or late nineteenth century Japan. They then look at the questions that other researchers are asking in other circumstances, and bring their data to that question. For instance, it is common to see a particular question initially asked for the United States, and then a host of researchers answer that same question for the United Kingdom, Canada, Australia, and many other countries.

There are a number of negatives to data-first research, including the fact that it can be especially uncertain. It can also struggle for external validity because there is always a worry about a selection effect.

A variant of data-driven research is model-driven research. Here a researcher becomes an expert on some particular statistical approach and then applies that approach to appropriate contexts.

4.3.2 Question-first

When trying to be question-first, there is the inverse issue of being concerned about data availability. The “FINER framework” is used in medicine to help guide the development of research questions. It recommends asking questions that are: Feasible, Interesting, Novel, Ethical, and Relevant (Hulley et al. 2007). Farrugia et al. (2010) build on FINER with PICOT, which recommends additional considerations: Population, Intervention, Comparison group, Outcome of interest, and Time.

It can feel overwhelming trying to write out a question. One way to go about it is to ask a very specific question. Another is to decide whether we are interested in descriptive, predictive, inferential, or causal analysis. These then lead to different types of questions. For instance:

descriptive analysis: “What does \(x\) look like?”;
predictive analysis: “What will happen to \(x\)?”;
inferential: “How can we explain \(x\)?”; and
causal: “What impact does \(x\) have on \(y\)?”.

Each of these have a role to play. Since the credibility revolution (Angrist and Pischke 2010), causal questions answered with a particular approach have been predominant. This has brought some benefit, but not without cost. Descriptive analysis can be just as, indeed sometimes more, illuminating, and is critical (Sen 1980). The nature of the question being asked matters less than being genuinely interested in answering it.

Time will often be constrained, possibly in an interesting way and this can guide the specifics of the research question. If we are interested in the effect of a celebrity’s announcements on the stock market, then that can be done by looking at stock prices before and after the announcement. But what if we are interested in the effect of a cancer drug on long term outcomes? If the effect takes 20 years, then we must either wait a while, or we need to look at people who were treated twenty years ago. We then have selection effects and different circumstances compared to if we were to administer the drug today. Often the only reasonable thing to do is to build a statistical model, but that brings other issues.

4.4 Answering questions

4.4.1 Counterfactuals and bias

The creation of a counterfactual is often crucial when answering questions. A counterfactual is an if-then statement in which the “if” is false. Consider the example of Humpty Dumpty in Through the Looking-Glass by Lewis Carroll:

“What tremendously easy riddles you ask!” Humpty Dumpty growled out. “Of course I don’t think so! Why, if ever I did fall off—which there’s no chance of—but if I did—” Here he pursed his lips and looked so solemn and grand that Alice could hardly help laughing. “If I did fall,” he went on, “The King has promised me—with his very own mouth-to-to-” “To send all his horses and all his men,” Alice interrupted, rather unwisely.

Carroll (1871)

Humpty is satisfied with what would happen if he were to fall off, even though he is convinced that this would never happen. It is this comparison group that often determines the answer to a question. For instance, in Chapter 15 we consider the effect of VO2 max on a cyclist’s chance of winning a race. If we compare over the general population then it is an important variable. But if we only compare over well-trained athletes, then it is less important, because of selection.

Two aspects of the data to be especially aware of when deciding on a research question are selection bias and measurement bias.

Selection bias occurs when the results depend on who is in the sample. One of the pernicious aspects of selection bias is that we need to know about its existence in order to do anything about it. But many default diagnostics will not identify selection bias. In A/B testing, which we discuss in Chapter 8, A/A testing is a slight variant where we create groups and compare them before imposing a treatment (hence the A/A nomenclature). This effort to check whether the groups are initially the same, can help to identify selection bias. More generally, comparing the properties of the sample, such as age-group, gender, and education, with characteristics of the population can assist as well. But the fundamental problem with selection bias and observational data is that we know people about whom we have data are different in at least one way to those about whom we do not! But we do not know in what other ways they may be different.

Selection bias can pervade many aspects of our analysis. Even a sample that is initially representative may become biased over time. For instance, survey panels, that we discuss in Chapter 6, need to be updated from time to time because the people who do not get anything out of it stop responding.

Another bias to be aware of is measurement bias, which occurs when the results are affected by how the data were collected. A common example of this is if we were to ask respondents their income, then we may get different answers in-person compared with an online survey.

4.4.2 Estimands

We will typically be interested in using data to answer our question and it is important that we are clear about specifics. For instance, we might be interested in the effect of smoking on life expectancy. In that case, there is some true effect, which we can never know, and that true effect is called the “estimand” (Little and Lewis 2021). Defining the estimand at some point in the paper, ideally in the introduction, is critical (Lundberg, Johnson, and Stewart 2021). This is because it is easy to slightly change some specific aspect of the analysis plan and end up accidentally estimating something different (Kahan et al. 2022). They are beginning to be required by some medicine regulators (Kahan et al. 2024). For an estimand we are looking for a clear description of what the effect represents (Kahan et al. 2023). An “estimator” is a process by which we use the data that we have available to generate an “estimate” of the “estimand”. Efron and Morris (1977) provide a discussion of estimators and related concerns.

Bueno de Mesquita and Fowler (2021, 94) describe the relationship between an estimate and an estimand as:

\[ \mbox{Estimate = Estimand + Bias + Noise} \]

Bias refers to issues with an estimator systematically providing estimates that are different from the estimand, while noise refers to non-systematic differences. For instance, consider a standard Normal distribution. We might be interested in understanding the average, which would be our estimand. We know (in a way that we can never with real data) that the estimand is zero. Let us draw ten times from that distribution. One estimator we could use to produce an estimate is: sum the draws and divide by the number of draws. Another is to order the draws and find the middle observation. To be more specific, we will simulate this situation (Table 4.1).

set.seed(853)

tibble(
  num_draws = c(
    rep(10, times = 10),
    rep(100, times = 100),
    rep(1000, times = 1000),
    rep(10000, times = 10000)
  ),
  draw = rnorm(
    n = length(num_draws),
    mean = 0,
    sd = 1)
  ) |> 
  summarise(
    estimator_one = sum(draw) / unique(num_draws),
    estimator_two = sort(draw)[round(unique(num_draws) / 2, 0)],
    .by = num_draws
  ) |>
  tt() |> 
  style_tt(j = 2:3, align = "r") |> 
  format_tt(digits = 2, num_mark_big = ",", num_fmt = "decimal") |> 
  setNames(c("Number of draws", "Estimator one", "Estimator two"))

Table 4.1: Comparing the results of two estimators of the average of random draws as the number of draws increases

Number of draws	Estimator one	Estimator two
10	-0.58	-0.82
100	-0.06	-0.07
1,000	0.06	0.04
10,000	-0.01	-0.01

As the number of draws increases, the effect of noise is removed, and our estimates illustrate the bias of our estimators. In this example, we know what the truth is, but when considering real data it can be more difficult to know what to do. Hence the importance of being clear about what the estimand is, before turning to generating estimates.

4.4.3 Directed Acyclic Graphs

When we are thinking about the variables we will use to answer our question, it can help to be specific about what we mean. It is easy to get caught up in observational data and trick ourselves. We should think hard, and to use all the tools available to us. One framework that can help with thinking hard about our data is the use of directed acyclic graphs (DAG). DAGs are a fancy name for a flow diagram and involve drawing arrows and lines between the variables to indicate the relationship between them.

To construct them we use Graphviz, which is an open-source package for graph visualization and is built into Quarto. The code needs to be wrapped in a “dot” chunk rather than “R”, and the chunk options are set with “//|” instead of “#|”. Alternatives that do not require this include the use of DiagrammeR (Iannone 2022) and ggdag (Barrett 2021). We provide the whole chunk for the first DAG, but then, only provide the code for the others.

```{dot}
//| label: fig-dot-firstdag-quarto
//| fig-cap: "We expect a causal relationship between x and y, where x influences y"
//| fig-width: 4
digraph D {
  node [shape=plaintext, fontname = "helvetica"];
  
  {rank=same x y};
  
  x -> y;
}
```

Figure 4.1: We expect a causal relationship between x and y, where x influences y

In Figure 4.1, we are saying that we think x causes y.

We could build another DAG where the situation is less clear. To make the examples a little easier to follow, we will switch to thinking about a hypothetical relationship between income and happiness, with consideration of variables that could affect that relationship. In this first one we consider the relationship between income and happiness, along with education (Figure 4.2).

digraph D {
  
  node [shape=plaintext, fontname = "helvetica"];
  
  a [label = "Income"];
  b [label = "Happiness"];
  c [label = "Education"];
  
  { rank=same a b};
  
  a->b;
  c->{a, b};
}

Figure 4.2: Education is a confounder that affects the relationship between income and happiness

In Figure 4.2, we think income causes happiness. But we also think that education causes happiness, and that education also causes income. That relationship is a “backdoor path”, and failing to adjust for education in a regression could overstate the extent of the relationship, or even create a spurious relationship, between income and happiness in our analysis. That is, we may think that changes in income are causing changes in happiness, but it could be that education is changing them both. That variable, in this case, education, is called a “confounder”.

Hernán and Robins (2023, 83) discuss an interesting case where a researcher was interested in whether one person looking up at the sky makes others look up at the sky also. There was a clear relationship between the responses of both people. But it was also the case that there was noise in the sky. It was unclear whether the second person looked up because the first person looked up, or they both looked up because of the noise. When using experimental data, randomization allows us to avoid this concern, but with observational data we cannot rely on that. It is also not the case that bigger data necessarily get around this problem for us. Instead, we should think carefully about the situation, and DAGs can help with that.

If there are confounders, but we are still interested in causal effects, then we need to adjust for them. One way is to include them in the regression. But the validity of this requires several assumptions. In particular, Gelman and Hill (2007, 169) warn that our estimate will only correspond to the average causal effect in the sample if we include all of the confounders and have the right model. Putting the second requirement to one side, and focusing only on the first, if we do not think about and observe a confounder, then it can be difficult to adjust for it. And this is an area where both domain expertise and theory can bring considerable weight to an analysis.

In Figure 4.3 we again consider that income causes happiness. But, if income also causes children, and children also cause happiness, then we have a situation where it would be tricky to understand the effect of income on happiness.

digraph D {

  node [shape=plaintext, fontname = "helvetica"];
  
  a [label = "Income"];
  b [label = "Happiness"];
  c [label = "Children"];
  
  { rank=same a b};
  
  a->{b, c};
  c->b;
}

Figure 4.3: Children as a mediator between income and happiness

In Figure 4.3, children is called a “mediator” and we would not adjust for it if we were interested in the effect of income on happiness. If we were to adjust for it, then some of what we are attributing to income, would be due to children.

Finally, in Figure 4.4 we have yet another similar situation, where we think that income causes happiness. But this time both income and happiness also cause exercise. For instance, if you have more money then it may be easier to exercise, but also it may be easier to exercise if you are happier.

digraph D {

  node [shape=plaintext, fontname = "helvetica"];
  
  a [label = "Income"];
  b [label = "Happiness"];
  c [label = "Exercise"];
  
  { rank=same a b};
  
  a->{b c};
  b->c;
}

Figure 4.4: Exercise as a collider affecting the relationship between income and happiness

In this case, exercise is called a “collider” and if we were to condition on it, then we would create a misleading relationship. Income influences exercise, but a person’s happiness also affects this. Exercise is a collider because both the predictor and outcome variable of interest influence it.

We will be clear about this: we must create the DAG ourselves, in the same way that we must put together the model ourselves. There is nothing that will create it for us. This means that we need to think carefully about the situation. Because it is one thing to see something in the DAG and then do something about it, but it is another to not even know that it is there. McElreath ([2015] 2020, 180) describes these as haunted DAGs. DAGs are helpful, but they are just a tool to help us think deeply about our situation.

When we are building models, it can be tempting to include as many predictor variables as possible. DAGs show clearly why we need to be more thoughtful. For instance, if a variable is a confounder, then we would want to adjust for it, whereas if a variable was a collider then we would not. We can never know the truth, and we are informed by aspects such as theory, what we are interested in, research design, limitations of the data, or our own limitations as researchers, to name a few. Knowing the limits is as important as reporting the model. Data and models with flaws are still useful, if you acknowledge those flaws. The work of thinking about a situation is never done, and relies on others, which is why we need to make all our work as reproducible as possible.

4.5 Components of a paper

I had not indeed published anything before I commenced The Professor, but in many a crude effort, destroyed almost as soon as composed, I had got over any such taste as I might once have had for ornamented and redundant composition, and come to prefer what was plain and homely.

The Professor (Brontë 1857)

We discuss the following components: title, abstract, introduction, data, results, discussion, figures, tables, equations, and technical terms.¹ Throughout the paper try to be as brief and specific as possible. Most readers will not get past the title. Almost no one will read more than the abstract. Section and sub-section headings, as well as graph and table captions should work on their own, without the surrounding text, because that type of skimming is how many people read papers (Keshav 2007).

4.5.1 Title

A title is the first opportunity that we have to engage our reader in our story. Ideally, we are able to tell our reader exactly what we found. Effective titles are critical because otherwise papers could be ignored by readers. While a title does not have to be “cute”, it does need to be meaningful. This means it needs to make the story clear.

One example of a title that is good enough is “On the 2016 Brexit referendum”. This title is useful because the reader knows what the paper is about. But it is not particularly informative or enticing. A slightly better title could be “On the Vote Leave outcome in the 2016 Brexit referendum”. This variant adds informative specificity. We argue the best title would be something like “Vote Leave outperforms in rural areas in the 2016 Brexit referendum: Evidence from a Bayesian hierarchical model”. Here the reader knows the approach of the paper and also the main take-away.

We will consider a few examples of particularly effective titles. Hug et al. (2019) use “National, regional, and global levels and trends in neonatal mortality between 1990 and 2017, with scenario-based projections to 2030: a systematic analysis”. Here it is clear what the paper is about and the methods that are used. R. Alexander and Alexander (2021) use “The Increased Effect of Elections and Changing Prime Ministers on Topics Discussed in the Australian Federal Parliament between 1901 and 2018”. The main finding is, along with a good deal of information about what the content will be, clear from the title. M. Alexander, Kiang, and Barbieri (2018) use “Trends in Black and White Opioid Mortality in the United States, 1979–2015”; Frei and Welsh (2022) use “How the closure of a US tax loophole may affect investor portfolios”. Possibly one of the best titles ever is Bickel, Hammel, and O’Connell (1975) “Sex Bias in Graduate Admissions: Data from Berkeley: Measuring bias is harder than is usually assumed, and the evidence is sometimes contrary to expectation”, which we return to in Chapter 15.

A title is often among the last aspects of a paper to be finalized. While getting through the first draft, we typically use a working title that gets the job done. We then refine it over the course of redrafting. The title needs to reflect the final story of the paper, and this is not usually something that we know at the start. We must strike a balance between getting our reader interested enough to read the paper, and conveying enough of the content so as to be useful (Hayot 2014). Two excellent examples are The History of England from the Accession of James the Second by Thomas Babington Macaulay, and A History of the English-Speaking Peoples by Winston Churchill. Both are clear about what the content is, and, for their target audience, spark interest.

One specific approach is the form: “Exciting content: Specific content”, for instance, “Returning to their roots: Examining the performance of Vote Leave in the 2016 Brexit referendum”. Kennedy and Gelman (2021) provide a particularly nice example of this approach with “Know your population and know your model: Using model-based regression and poststratification to generalize findings beyond the observed sample”, as does Craiu (2019) with “The Hiring Gambit: In Search of the Twofer Data Scientist”. A close variant of this is “A question? And an approach”. For instance, Cahill, Weinberger, and Alkema (2020) with “What increase in modern contraceptive use is needed in FP2020 countries to reach 75% demand satisfied by 2030? An assessment using the Accelerated Transition Method and Family Planning Estimation Model”. As you gain experience with this variant, it becomes possible to know when it is appropriate to drop the answer part yet remain effective, such as Briggs (2021) with “Why Does Aid Not Target the Poorest?”. Another specific approach is “Specific content then broad content” or the inverse. For instance, “Rurality, elites, and support for Vote Leave in the 2016 Brexit referendum” or “Support for Vote Leave in the 2016 Brexit referendum, rurality and elites”. This approach is used by Tolley and Paquet (2021) with “Gender, municipal party politics, and Montreal’s first woman mayor”.

Sometimes it is possible to include a subtitle. When this is possible, a great way to take advantage of this is to use it to include some detail of the main quantitative result that you found. Getting the right level of detail and abstraction about that result is difficult and will require re-writing and getting other’s opinions.

4.5.2 Abstract

For a ten-to-fifteen-page paper, a good abstract is a three-to-five sentence paragraph. For a longer paper the abstract can be slightly longer. The abstract needs to specify the story of the paper. It must also convey what was done and why it matters. To do so, an abstract typically touches on the context of the work, its objectives, approach, and findings.

More specifically, a good recipe for an abstract is: first sentence: specify the general area of the paper and encourage the reader; second sentence: specify the dataset and methods at a general level; third sentence: specify the headline result; and a fourth sentence about implications.

We see this pattern in a variety of abstracts. For instance, Tolley and Paquet (2021) draw in the reader with their first sentence by mentioning the election of the first woman mayor in 400 years. The second sentence is clear about what is done in the paper. The third sentence tells the reader how it is done i.e. a survey, and the fourth sentence adds some detail. The fifth and final sentence makes the main take-away clear.

In 2017, Montreal elected Valérie Plante, the first woman mayor in the city’s 400-year history. Using this election as a case study, we show how gender did and did not influence the outcome. A survey of Montreal electors suggests that gender was not a salient factor in vote choice. Although gender did not matter much for voters, it did shape the organization of the campaign and party. We argue that Plante’s victory can be explained in part by a strategy that showcased a less leader-centric party and a degendered campaign that helped counteract stereotypes about women’s unsuitability for positions of political leadership.

Similarly, Beauregard and Sheppard (2021) make the broader environment clear within the first two sentences, and the specific contribution of this paper to that environment. The third and fourth sentences make the data source and main findings clear. The fifth and sixth sentences add specificity that would be of interest to likely readers of this abstract i.e. academic political scientists. In the final sentence, the position of the authors is made clear.

Previous research on support for gender quotas focuses on attitudes toward gender equality and government intervention as explanations. We argue the role of attitudes toward women in understanding support for policies aiming to increase the presence of women in politics is ambivalent—both hostile and benevolent forms of sexism contribute in understanding support, albeit in different ways. Using original data from a survey conducted on a probability-based sample of Australian respondents, our findings demonstrate that hostile sexists are more likely to oppose increasing of women’s presence in politics through the adoption of gender quotas. Benevolent sexists, on the other hand, are more likely to support these policies than respondents exhibiting low levels of benevolent sexism. We argue this is because benevolent sexism holds that women are pure and need protection; they do not have what it takes to succeed in politics without the assistance of quotas. Finally, we show that while women are more likely to support quotas, ambivalent sexism has the same relationship with support among both women and men. These findings suggest that aggregate levels of public support for gender quotas do not necessarily represent greater acceptance of gender equality generally.

Another excellent example of an abstract is Sides, Vavreck, and Warshaw (2021). In just five sentences, they make it clear what they do, how they do it, what they find, and why it is important.

We provide a comprehensive assessment of the influence of television advertising on United States election outcomes from 2000–2018. We expand on previous research by including presidential, Senate, House, gubernatorial, Attorney General, and state Treasurer elections and using both difference-in-differences and border-discontinuity research designs to help identify the causal effect of advertising. We find that televised broadcast campaign advertising matters up and down the ballot, but it has much larger effects in down-ballot elections than in presidential elections. Using survey and voter registration data from multiple election cycles, we also show that the primary mechanism for ad effects is persuasion, not the mobilization of partisans. Our results have implications for the study of campaigns and elections as well as voter decision making and information processing.

The best abstracts will have such a high content to words ratio that they may even feel a little terse. For instance, in the abstract of Touvron et al. (2023), there is not a word that is wasted and they communicate a large amount of information in only four sentences.

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

Kasy and Teytelboym (2023) provide an excellent example of a more statistical abstract. They clearly identify what they do and why it is important.

We consider an experimental setting in which a matching of resources to participants has to be chosen repeatedly and returns from the individual chosen matches are unknown but can be learned. Our setting covers two-sided and one-sided matching with (potentially complex) capacity constraints, such as refugee resettlement, social housing allocation, and foster care. We propose a variant of the Thompson sampling algorithm to solve such adaptive combinatorial allocation problems. We give a tight, prior-independent, finite-sample bound on the expected regret for this algorithm. Although the number of allocations grows exponentially in the number of matches, our bound does not. In simulations based on refugee resettlement data using a Bayesian hierarchical model, we find that the algorithm achieves half of the employment gains (relative to the status quo) that could be obtained in an optimal matching based on perfect knowledge of employment probabilities.

Finally, Briggs (2021) begins with a claim that seems unquestionably true. In the second sentence he then says that it is false! The third sentence specifies the extent of this claim, and the fourth sentence details how he comes to this position, before providing more detail. The final two sentences speak broader implications and importance.

Foreign-aid projects typically have local effects, so they need to be placed close to the poor if they are to reduce poverty. I show that, conditional on local population levels, World Bank (WB) project aid targets richer parts of countries. This relationship holds over time and across world regions. I test five donor-side explanations for pro-rich targeting using a pre-registered conjoint experiment on WB Task Team Leaders (TTLs). TTLs perceive aid-receiving governments as most interested in targeting aid politically and controlling implementation. They also believe that aid works better in poorer or more remote areas, but that implementation in these areas is uniquely difficult. These results speak to debates in distributive politics, international bargaining over aid, and principal-agent issues in international organizations. The results also suggest that tweaks to WB incentive structures to make ease of project implementation less important may encourage aid to flow to poorer parts of countries.

Nature, a scientific journal, provides a guide for constructing an abstract. They recommend a structure that results in an abstract of six parts and adds up to around 200 words:

An introductory sentence that is comprehensible to a wide audience.
A more detailed background sentence that is relevant to likely readers.
A sentence that states the general problem.
Sentences that summarize and then explain the main results.
A sentence about general context.
And finally, a sentence about the broader perspective.

The first sentence of an abstract should not be vacuous. Assuming the reader continued past the title, this first sentence is the next opportunity that we have to implore them to keep reading our paper. And then the second sentence of the abstract, and so on. Work and re-work the abstract until it is so good that you would be fine if that was the only thing that was read; because that will often be the case.

4.5.3 Introduction

An introduction needs to be self-contained and convey everything that a reader needs to know. We are not writing a mystery story. Instead, we want to give away the most important points in the introduction. For a ten-to-fifteen-page paper, an introduction may be two or three paragraphs of main content. Hayot (2014, 90) says the goal of an introduction is to engage the reader, locate them in some discipline and background, and then tell them what happens in the rest of the paper. It should be completely reader-focused.

The introduction should set the scene and give the reader some background. For instance, we typically start a little broader. This provides some context to the paper. We then describe how the paper fits into that context, and give some high-level results, especially focused on the one key result that is the main part of the story. We provide more detail here than we provided in the abstract, but not the full extent. And we broadly discuss next steps in a sentence or two. Finally, we finish the introduction with an additional short final paragraph that highlights the structure of the paper.

As an example (with made-up details):

The UK Conservative Party has always done well in rural electorates. And the 2016 Brexit vote was no different with a significant difference in support between rural and urban areas. But even by the standard of rural support for conservative issues, support for “Vote Leave” was unusually strong with “Vote Leave” being most heavily supported in the East Midlands and the East of England, while the strongest support for “Remain” was in Greater London.

In this paper we look at why the performance of “Vote Leave” in the 2016 Brexit referendum was so correlated with rurality. We construct a model in which support for “Vote Leave” at a voting area level is explained by the number of farms in the area, the average internet connectivity, and the median age. We find that as the median age of an area increases, the likelihood that an area supported “Vote Leave” decreases by 14 percentage points. Future work could look at the effect of having a Conservative MP which would allow a more nuanced understanding of these effects.

The remainder of this paper is structured as follows: Section 2 discusses the data, Section 3 discusses the model, Section 4 presents the results, and finally Section 5 discusses our findings and some weaknesses.

The introduction needs to be self-contained and tell the reader almost everything that they need to know. A reader should be able to only read the introduction and have an accurate picture of all the major aspects of the whole paper. It would be rare to include graphs or tables in the introduction. An introduction should close by telegraphing the structure of the paper.

4.5.4 Data

Robert Caro, Lyndon Johnson’s biographer, describes the importance of conveying “a sense of place” when writing a biography (Caro 2019, 141). He defines this as “the physical setting in which a book’s action is occurring: to see it clearly enough, in sufficient detail, so that he feels as if he himself were present while the action is occurring.” He provides the following example:

When Rebekah walked out the front door of that little house, there was nothing—a roadrunner streaking behind some rocks with something long and wet dangling from his beak, perhaps, or a rabbit disappearing around a bush so fast that all she really saw was the flash of a white tail—but otherwise nothing. There was no movement except for the ripple of the leaves in the scattered trees, no sound except for the constant whisper of the wind\(\dots\) If Rebekah climbed, almost in desperation, the hill in the back of the house, what she saw from its crest was more hills, an endless vista of hills, hills on which there was visible not a single house\(\dots\) hills on which nothing moved, empty hills with, above them, empty sky; a hawk circling silently overhead was an event. But most of all, there was nothing human, no one to talk to.

Caro (2019, 146)

How thoroughly we can imagine the circumstances of Johnson’s mother, Rebekah Baines Johnson. When writing our papers, we need to achieve that same sense of place, for our data, as Caro provides for the Hill County. We do this by being as explicit as possible. We typically have a whole section about it and this is designed to show the reader, as closely as possible, the actual data that underpin our story.

When writing the data section, we are beginning our answer to the critical question about our claim, which is, how is it possible to know this? (McPhee 2017, 78). An excellent example of a data section is provided by Doll and Hill (1950). They are interested in the effect of smoking between control and treatment groups. After clearly describing their dataset they use tables to display relevant cross-tabs and graphs to contrast groups.

In the data section we need to thoroughly discuss the variables in the dataset that we are using. If there are other datasets that could have been used, but were not, then this should be mentioned and the choice justified. If variables were constructed or combined, then this process and motivation should be explained.

We want the reader to understand what the data that underpin the results look like. This means that we should graph the data that are used in our analysis, or as close to them as possible. And we should also include tables of summary statistics. If the dataset was created from some other source, then it can also help to include an example of that original source. For instance, if the dataset was created from survey responses then the underlying survey questions should be included in an appendix.

Some judgment is required when it comes to the figures and tables in the data section. The reader should have the opportunity to understand the details, but it may be that some are better placed in an appendix. Figures and tables are a critical aspect of convincing people of a story. In a graph we can show the data and then let the reader decide for themselves. And using a table, we can summarize a dataset. At the very least, every variable should be shown in a graph and summarized in a table. If there are too many, then some of these could be relegated to an appendix, with the critical relationships shown in the main body. Figures and tables should be numbered and then cross-referenced in the text, for instance, “Figure 1 shows\(\dots\)”, “Table 1 describes\(\dots\)”. For every graph and table there should be accompanying text that describes their main aspects, and adds additional detail.

We discuss the components of graphs and tables, including titles and labels, in Chapter 5. But here we will discuss captions, as they are between the text and the graph or table. Captions need to be informative and self-contained. Borkin et al. (2015) use eye-tracking to understand how visualizations are recognized and recalled. They find that captions need to make the central message of the figure clear, and that there should be redundancy. As Cleveland ([1985] 1994, 57) says, the “interplay between graph, caption, and text is a delicate one”, however the reader should be able to read only the caption and understand what the graph or table shows. A caption that is two lines long is not necessarily inappropriate. And all aspects of the graph or table should be explained. For instance, consider Figure 4.5 (a) and Figure 4.5 (b), both from Bowley (1901, 151). They are clear, and self-contained.

The choice between a table and a graph comes down to how much information is to be conveyed. In general, if there is specific information that should be considered, such as a summary statistic, then a table is a good option. If we are interested in the reader making comparisons and understanding trends, then a graph is a good option (Gelman, Pasarica, and Dodhia 2002).

4.5.5 Model

We often build a statistical model that we will use to explore the data, and it is normal to have a specific section about this. At a minimum you should specify the equations that describe the model being used and explain their components with plain language and cross-references.

The model section typically begins with the model being written out, explained, and justified. Depending on the expected reader, some background may be needed. After specifying the model with appropriate mathematical notation and cross-referencing it, the components of the model should then be defined and explained. Try to define each aspect of the notation. This helps convince the reader that the model was well-chosen and enhances the credibility of the paper. The model’s variables should correspond to those that were discussed in the data section, making a clear link between the two sections.

There should be some discussion of how features enter the model and why. Some examples could include:

Why use age rather than age-groups?
Why does state/province have a levels effect?
Why is gender a categorical variable? In general, we are trying to convey a sense that this is the appropriate model for the situation. We want the reader to understand how the aspects that were discussed in the data section assert themselves in the modeling decisions that were made.

The model section should close with some discussion of the assumptions that underpin the model. It should also have a brief discussion of alternative models or variants. You want the strengths and weaknesses to be clear and for the reader to know why this particular model was chosen.

At some point in this section, it is usually appropriate to specify the software that was used to run the model, and to provide some evidence of thought about the circumstances in which the model may not be appropriate. That second point would typically be expanded on in the discussion section. And there should be evidence of model validation and checking, model convergence, and/or diagnostic issues. Again, there is a balance needed here, and some of this content may be more appropriately placed in appendices.

When technical terms are used, they should be briefly explained in plain language for readers who might not be familiar with it. For instance, M. Alexander (2019) integrates an explanation of the Gini coefficient that brings the reader along.

To look at the concentration of baby names, let’s calculate the Gini coefficient for each country, sex and year. The Gini coefficient measures dispersion or inequality among values of a frequency distribution. It can take any value between 0 and 1. In the case of income distributions, a Gini coefficient of 1 would mean one person has all the income. In this case, a Gini coefficient of 1 would mean that all babies have the same name. In contrast, a Gini coefficient of 0 would mean names are evenly distributed across all babies.

There may be papers that do not include a statistical model. In that case, this “Model” section should be replaced by a broader “Methodology” section. It might describe the simulation that was conducted, or contain more general details about the approach.

4.5.6 Results

Two excellent examples of results sections are provided by Kharecha and Hansen (2013) and Kiang et al. (2021). In the results section, we want to communicate the outcomes of the analysis in a clear way and without too much focus on the discussion of implications. The results section likely requires summary statistics, tables, and graphs. Each of those aspects should be cross-referenced and have text associated with them that details what is seen in each figure. This section should relay results; that is, we are interested in what the results are, rather than what they mean.

This section would also typically include tables of graphs of coefficient estimates based on the modeling. Various features of the estimates should be discussed, and differences between the models explained. It may be that different subsets of the data are considered separately. Again, all graphs and tables need to have text in plain language accompany them. A rough guide is that the amount of text should be at least equal to the amount of space taken up by the tables and graphs. For instance, if a full page is used to display a table of coefficient estimates, then that should be cross-referenced and accompanied by about a full page of text about that table.

4.5.7 Discussion

A discussion section may be the final section of a paper and would typically have four or five sub-sections.

The discussion section would typically begin with a sub-section that comprises a brief summary of what was done in the paper. This would be followed by two or three sub-sections that are devoted to the key things that we learn about the world from this paper. These sub-sections are the main opportunity to justify or detail the implications of the story being told in the paper. Typically, these sub-sections do not see newly introduced graphs or tables, but are instead focused on what we learn from those that were introduced in earlier sections. It may be that some of the results are discussed in relation to what others have found, and differences could be attempted to be reconciled here.

Following these sub-sections of what we learn about the world, we would typically have a sub-section focused on some of the weaknesses of what was done. This could concern aspects such as the data that were used, the approach, and the model. In the case of the model we are especially concerned with those aspects that might affect the findings. This can be especially difficult in the case of machine learning models and Smith et al. (2022) provide guidance for aspects to consider. And the final sub-section is typically a few paragraphs that specify what is left to learn, and how future work could proceed.

In general, we would expect this section to take at least 25 per cent of the total paper. This means that in an eight-page paper we would expect at least two pages of discussion.

4.5.8 Brevity, typos, and grammar

Brevity is important. This is partly because we write for the reader, and the reader has other priorities. But it is also because as the writer it forces us to consider what our most important points are, how we can best support them, and where our arguments are weakest. Jean Chrétien, is a former Canadian prime minister. In Chrétien (2007, 105) he wrote that he used to ask “\(\dots\)the officials to summarize their documents in two or three pages and attach the rest of the materials as background information. I soon discovered that this was a problem only for those who didn’t really know what they were talking about” .

This experience is not unique to Canada and it is not new. In Hughes and Rutter (2016) Oliver Letwin, the former British cabinet member, describes there being “a huge amount of terrible guff, at huge, colossal, humongous length coming from some departments” and how he asked “for them to be one quarter of the length”. He found that the departments were able to accommodate this request without losing anything important. Winston Churchill asked for brevity during the Second World War, saying “the discipline of setting out the real points concisely will prove an aid to clearer thinking.” The letter from Szilard and Einstein to FDR that was the catalyst for the Manhattan Project was only two pages!

Zinsser (1976) goes further and describes “the secret of good writing” being “to strip every sentence to its cleanest components.” Every sentence should be simplified to its essence. And every word that does not contribute should be removed.

Unnecessary words, typos, and grammatical issues should be removed from papers. These mistakes affect the credibility of claims. If the reader cannot trust you to use a spell-checker, then why should they trust you to use logistic regression? RStudio has a spell-checker built in, but Microsoft Word and Google Docs are useful additional checks. Copy from the Quarto document and paste into Word, then look for the red and green lines, and fix them in the Quarto document.

We are not worried about the n-th degree of grammatical content. Instead, we are interested in grammar and sentence structure that occurs in conversational language use (S. King 2000, 118). The way to develop comfort is by reading widely and asking others to also read your work. Another useful tactic is to read your writing aloud, which can be useful for detecting odd sentences based on how they sound. One small aspect to check that will regularly come up is that any number from one to ten should be written as words, while 11 and over should be written as numbers.

4.5.9 Rules

A variety of authors have established rules for writing. This famously includes those of Orwell (1946) which were reimagined by The Economist (2013). A further reimagining of rules for writing, focused on telling stories with data, could be:

Focus on the reader and their needs. Everything else is commentary.
Establish a structure and then rely on that to tell the story.
Write a first draft as quickly as possible.
Rewrite that draft extensively.
Be concise and direct. Remove as many words as possible.
Use words precisely. For instance, stock prices rise or fall, rather than improve or worsen.
Use short sentences where possible.
Avoid jargon.
Write as though your work will be on the front page of a newspaper.
Never claim novelty or that you are the “first to study X”—there is always someone else who got there first.

Fiske and Kuriwaki (2021) have a list of rules for scientific papers and the appendix of Pineau et al. (2021) provides a checklist for machine learning papers. But perhaps the last word should be from Savage and Yeh (2019):

[T]ry to write the best version of your paper: the one that you like. You can’t please an anonymous reader, but you should be able to please yourself. Your paper—you hope—is for posterity.

Savage and Yeh (2019, 442)

4.6 Exercises

Practice

(Plan) Consider the following scenario: A child and their parent watch street cars from their apartment window. Every hour, for eight hours, they record the number of streetcars that go past. Please sketch what a dataset could look like, and then sketch a graph that you could build to show all observations.
(Simulate) Please further consider the scenario described and simulate the situation. Then write five tests based on the simulated data.
(Acquire) Please specify a source of actual data about some aspect of public transportation in a city that you are interested in.
(Explore) Build a graph and table using the simulated data.
(Share) Please write some text to accompany the graph and table, as if they reflected the actual situation. The exact details contained in the paragraphs do not have to be factual but they should be reasonable (i.e. you do not actually have to get the data nor create the graphs). Separate the code appropriately into R files and a Quarto doc. Submit a link to a GitHub repo with a README.

Quiz

What are three features of a good research question (write a paragraph or two)?
How do Light, Singer, and Willett (1990) recommend going from a broad theme to planning a study in detail (pick one)?
1. Talk to experts.
2. Identify available data.
3. Articulate a set of specific research questions.
Why do Light, Singer, and Willett (1990) believe research questions are so important (select all that apply)?
1. They are the only basis for making sensible planning decisions.
2. They identify the target population from which you will draw a sample.
3. They determine the appropriate level of aggregation.
4. They identify the outcome variable.
5. They identify the key predictors.
6. They raise challenges for measurement and data collection.
From Light, Singer, and Willett (1990), what is the purpose of the “spiral of theory and data” in research (pick one)?
1. To collect the data before developing theories.
2. To iteratively refine both theory and data by moving between them.
3. To ensure data collection is completed before any theoretical analysis.
4. To focus solely on theoretical frameworks without data.
In the context of research approaches, what does data-first mean (pick one)?
1. Developing research questions without considering data availability.
2. Collecting new data specifically designed to answer a predefined question.
3. Prioritizing theoretical frameworks over empirical evidence.
4. Starting with available data and then determining the questions that can be answered.
What is an advantage of a data-first approach (pick one)?
1. It eliminates the need for theoretical frameworks.
2. It allows researchers to formulate questions based on available data.
3. It guarantees causal relationships can be established.
4. It prevents any form of bias in the research.
What is a disadvantage of a data-first approach (pick one)?
1. The concern that you are “searching under the streetlight”.
2. The concern about being able to contribute to theory.
3. The concern that causality will be difficult to tease apart.
4. The concern about external validity.
What is a counterfactual (include examples and references and write at least three paragraphs)?
What is a counterfactual (pick one)?
1. An alternative hypothesis that contradicts the main theory.
2. An if-then statement in which the if is false.
3. A fact that counters the main argument of the paper.
4. A statistical method used to adjust for confounding variables.
What does the “FINER” framework stand for (pick one)?
1. Flexible, Innovative, Neutral, Empirical, Replicable.
2. Formal, Interpretive, New, Experimental, Robust.
3. Focused, Integrated, Natural, Efficient, Reliable.
4. Feasible, Interesting, Novel, Ethical, Relevant.
What is an estimand (pick one)?
1. A variable that is measured with error.
2. A biased estimator.
3. The process of using data to calculate an estimate.
4. The true effect or quantity of interest that we aim to estimate.
What is an estimand (pick one)?
1. A rule for calculating an estimate of a given quantity based on observed data.
2. The object of inquiry.
3. A result given a particular dataset and approach.
What is an estimator (pick one)?
1. A rule for calculating an estimate of a given quantity based on observed data.
2. The object of inquiry.
3. A result given a particular dataset and approach.
What is the role of an estimator (pick one)?
1. It is the true effect we aim to estimate.
2. It is a rule or method for calculating an estimate from data.
3. It is a calculated value given a dataset and method.
4. It is an error term in the statistical model.
What is an estimate (pick one)?
1. A rule for calculating an estimate of a given quantity based on observed data.
2. The object of inquiry.
3. A result given a particular dataset and approach.
What is selection bias (pick one)?
1. When participants drop out of a study over time.
2. When results are affected by how data are measured.
3. When the sample is not representative of the population.
4. When variables are not properly controlled in an experiment.
What is measurement bias (pick one)?
1. When data are inaccurately recorded due to equipment failure.
2. When the data collection method systematically overstates or understates the true value.
3. When the process of measuring influences the results.
4. When the sample size is too small to draw conclusions.
What is the purpose of Directed Acyclic Graphs (DAGs) (pick one)?
1. To create random samples from complex populations.
2. To perform statistical tests on non-linear data.
3. To automatically generate statistical models.
4. To visually represent causal relationships between variables.
What is a benefit of building a DAG (pick one)?
1. They automatically identify causal relationships in data.
2. They eliminate the need for statistical analysis.
3. They help researchers think carefully about variable relationships.
4. They provide precise estimates of causal effects.
What is a confounder (pick one)?
1. A variable that is influenced by both the predictor and outcome variable.
2. A variable that affects both the predictor and outcome variables.
3. A variable that is affected by the predictor and affects the outcome variable.
What is a mediator (pick one)?
1. A variable that is influenced by both the predictor and outcome variable.
2. A variable that affects both the predictor and outcome variables.
3. A variable that is affected by the predictor and affects the outcome variable.
What is a collider (pick one)?
1. A variable that is influenced by both the predictor and outcome variable.
2. A variable that affects both the predictor and outcome variables.
3. A variable that is affected by the predictor and affects the outcome variable.
According to Chapter 2 of Zinsser (1976), what is the secret to good writing (pick one)?
1. Correct sentence structure and grammar.
2. The use of long words, adverbs, and passive voice.
3. Strip every sentence to its cleanest components.
4. Thorough planning.
According to Chapter 2 of Zinsser (1976), what must a writer constantly ask (pick one)?
1. Who am I writing for?
2. What am I trying to say?
3. How can this be rewritten?
4. Why does this matter?
What is one of the critical tasks in the process of writing a paper (pick one)?
1. Gathering as much data as possible before starting to write.
2. Spending extensive time perfecting each sentence in the first draft.
3. Getting to a first draft as quickly as possible.
4. Focusing on creating detailed graphs and tables before writing.
Why is getting to a first draft quickly important (pick one)?
1. It ensures that no mistakes are made in the initial draft.
2. It provides a complete version to revise and improve on.
3. It allows the writer to perfect each sentence as they go.
4. It reduces the overall time spent on writing.
What does “kill your darlings” mean (pick one)?
1. To avoid writing about controversial topics.
2. To use harsh criticism to improve your work.
3. To remove unnecessary content you are fond of but that doesn’t serve the main story.
4. To rewrite the entire draft from scratch.
Which of the following is a key benefit of writing for the writer, even when the focus is on the reader (pick one)?
1. It allows the writer to avoid rewriting the paper.
2. It helps the writer to work out what they believe and how they came to believe it.
3. It reduces the amount of feedback required from peers.
4. It ensures that the writer’s work will be published.
Which two repeated words, for instance in Chapter 3, characterize the advice of Zinsser (1976) (pick one)?
1. Rewrite, rewrite.
2. Simplify, simplify.
3. Remove, remove.
4. Less, less.
What is the main reason for removing unnecessary words, typos, and grammatical issues from a paper (pick one)?
1. To meet word count limits.
2. To impress reviewers with advanced vocabulary.
3. To make the paper longer.
4. To enhance the credibility of the claims.
Which of the following is the best title (pick one)?
1. “Standard errors of estimates from small samples”
2. “Standard errors”
3. “Problem Set 2”
What is one strategy for writing a title (pick one)?
1. Use technical jargon to impress expert readers.
2. Include both the general topic and specific information about the main finding.
3. Make it as short as possible, using only one or two words.
4. Pose the title as a question to engage the reader.
Please write a new title for Fourcade and Healy (2017).
Which of the following is NOT recommended when writing an abstract (pick one)?
1. Adding figures or tables to illustrate key points.
2. Including the main results and implications.
3. Using precise and concise language.
4. Making the abstract self-contained.
What is a common structure for writing an abstract (pick one)?
1. Start with implications, then methods, and end with context.
2. First sentence about the general area, second about methods, third about main result, fourth about implications.
3. Begin with limitations, followed by data sources, then results.
4. A series of questions that the paper will answer.
Using only the 1,000 most popular words in the English language, according to the XKCD Simple Writer, rewrite the abstract of Chambliss (1989) so that it retains its original meaning.
According to G. King (2006), what is the key task of subheadings (pick one)?
1. Use acronyms to integrate the paper into the literature.
2. Be broad and sweeping so that a reader is impressed by the importance of the paper.
3. Enable a reader who randomly falls asleep but keeps turning pages to know where they are.
What do you want to achieve by from the data section (pick one)?
1. To demonstrate the complexity of the data to impress the reader.
2. To create a sense of place by thoroughly describing the data.
3. To include as many graphs and tables as possible.
4. To hide any weaknesses in the data.
What is the primary goal of the data section in a research paper (pick one)?
1. To present as many tables and graphs as possible.
2. To thoroughly describe the data so the reader understands the basis of the results.
3. To convince the reader of the complexity of the analysis.
4. To discuss all possible data sources, even those not used.
According to G. King (2006), if our standard error was 0.05 then which of the following specificities for a coefficient would be silly (select all that apply)?
1. 2.7182818
2. 2.718282
3. 2.71828
4. 2.7183
5. 2.718
6. 2.72
7. 2.7
8. 3
What should a good figure or table caption accomplish (pick one)?
1. Be as brief as possible, ideally one line.
2. Be self-contained and explain the main take-away.
3. Include complex jargon to demonstrate expertise.
4. Provide minimal information to encourage the reader to read the text.
What should be in the model section (pick one)?
1. A summary of other models used in the literature.
2. Only the final results without any equations.
3. A general description without mathematical notation.
4. The equations, explanations, and definitions of all components.
Why is it important to discuss alternative models or variants in the model section (pick one)?
1. To demonstrate thorough consideration and justify the chosen model.
2. To show that other models are inferior.
3. To confuse the reader with multiple options.
4. To increase the length of the paper.
What is purpose of the results section (pick one)?
1. Interpreting the results and discussing their implications.
2. Critiquing other researchers’ findings.
3. Proposing future research directions.
4. Presenting the outcomes of the analysis clearly, without extensive interpretation.
In the results section, how should graphs and tables be integrated (pick one)?
1. They should stand alone without any accompanying text.
2. They should be minimized to avoid clutter.
3. They should be cross-referenced and discussed in the text.
4. They should be placed in an appendix to avoid interrupting the flow.
What is the purpose of the discussion section (pick one)?
1. To repeat the results in more detail.
2. To provide a detailed methodology.
3. To interpret the results, discuss implications, and acknowledge weaknesses.
4. To list all the limitations of the study without offering solutions.
Why does Savage and Yeh (2019) recommend asking yourself if it is possible to preserve your original message without that punctuation mark, that word, that sentence, that paragraph or that section (pick one)?
1. To reduce the possibility of errors.
2. To achieve clarity.
3. To keep the paper short.
What are key aspects of the re-drafting process (select all that apply)?
1. Going through it with a red pen to remove unneeded words.
2. Printing the paper and reading a physical copy.
3. Cutting and pasting to enhance flow.
4. Reading it aloud.
5. Exchanging it with others.
Why do poor grammar and typos affect the credibility of a paper (pick one)?
1. They enhance the casual tone of the paper.
2. They make the paper shorter.
3. They indicate a lack of attention to detail.
4. They are acceptable as long as the content is good.

Class activities

Discuss your preferred approach (data-first/question-first/other) to research and why.
Explain, with reference to examples, what is an estimand, estimator, and estimate.
Please consider “selection bias” and include the definition in a sentence in the same way that M. Alexander (2019) does for the Gini coefficient.
Please use ChatGPT, or an equivalent LLM, to create a prompt that answers the question “What is a selection effect?”. With a partner, improve the response by adding context, references, and making it true (if necessary). Discuss three aspects: 1) the prompt, 2) the original answer, 3) your augmented answer.
Pick one of the well-written quantitative papers:
- Write out the original title. What do you like, and not like, about it? Write an alternative title for it.
- Write out the abstract. What do you like, and not like, about it?
- Please prompt ChatGPT, or an equivalent LLM, to create an alternative abstract (copy the prompt so you can discuss it).
- Draw on all of this to put together an improved abstract and then discuss everything.
Make a plan, based on G. King (2006), for how you will write a meaningful paper by the end of this class. (For PhD students: Detail three journals/conferences, in order, that you will submit it to, and why the paper would be a good fit at each.)
Paper review: Please read Gerring (2012) and write a review of one page.

Task

Caro (2019, xii) writes at least 1,000 words almost every day. The purpose of this task is to give you the chance to do that also. Please pick one of the papers specified in the prerequisites and complete the following tasks:

Day 1: Transcribe, by writing each word yourself, the entire introduction.
Day 2: Rewrite the introduction so that it is five lines (or 10 per cent, whichever is less) shorter.
Day 3: Transcribe, by writing each word yourself, the abstract.
Day 4: Rewrite a new, four-sentence, abstract for the paper.
Day 5: Write a second version of your new abstract using only the 1,000 most popular words in the English language as defined here.
Day 6: Detail three points about the way the paper is written that you like
Day 7: Detail one point about the way the paper is written that you do not like.

Use Quarto build up a single PDF over the course of the week. After each day commit and push your work with an informative commit message. Make use of headings and sub-headings to structure your submission. Submit a link to your high-quality repo.

Alexander, Monica. 2019. “The Concentration and Uniqueness of Baby Names in Australia and the US,” January. https://www.monicaalexander.com/posts/2019-20-01-babynames/.

Alexander, Monica, Mathew Kiang, and Magali Barbieri. 2018. “Trends in Black and White Opioid Mortality in the United States, 1979–2015.” Epidemiology 29 (5): 707–15. https://doi.org/10.1097/EDE.0000000000000858.

Alexander, Rohan, and Monica Alexander. 2021. “The Increased Effect of Elections and Changing Prime Ministers on Topics Discussed in the Australian Federal Parliament Between 1901 and 2018.” https://doi.org/10.48550/arXiv.2111.09299.

Angrist, Joshua, and Jörn-Steffen Pischke. 2010. “The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con Out of Econometrics.” Journal of Economic Perspectives 24 (2): 3–30. https://doi.org/10.1257/jep.24.2.3.

Arel-Bundock, Vincent. 2024. tinytable: Simple and Configurable Tables in “HTML,” “LaTeX,” “Markdown,” “Word,” “PNG,” “PDF,” and “Typst” Formats. https://vincentarelbundock.github.io/tinytable/.

Barrett, Malcolm. 2021. ggdag: Analyze and Create Elegant Directed Acyclic Graphs. https://CRAN.R-project.org/package=ggdag.

Barron, Alexander, Jenny Huang, Rebecca Spang, and Simon DeDeo. 2018. “Individuals, Institutions, and Innovation in the Debates of the French Revolution.” Proceedings of the National Academy of Sciences 115 (18): 4607–12. https://doi.org/10.1073/pnas.1717729115.

Beauregard, Katrine, and Jill Sheppard. 2021. “Antiwomen but Proquota: Disaggregating Sexism and Support for Gender Quota Policies.” Political Psychology 42 (2): 219–37. https://doi.org/10.1111/pops.12696.

Bickel, Peter, Eugene Hammel, and William O’Connell. 1975. “Sex Bias in Graduate Admissions: Data from Berkeley: Measuring Bias Is Harder Than Is Usually Assumed, and the Evidence Is Sometimes Contrary to Expectation.” Science 187 (4175): 398–404. https://doi.org/10.1126/science.187.4175.398.

Birkmeyer, John, Jonathan Finks, Amanda O’Reilly, Mary Oerline, Arthur Carlin, Andre Nunn, Justin Dimick, Mousumi Banerjee, and Nancy Birkmeyer. 2013. “Surgical Skill and Complication Rates After Bariatric Surgery.” New England Journal of Medicine 369 (15): 1434–42. https://doi.org/10.1056/nejmsa1300625.

Bland, Martin, and Douglas Altman. 1986. “Statistical Methods for Assessing Agreement Between Two Methods of Clinical Measurement.” The Lancet 327 (8476): 307–10. https://doi.org/10.1016/S0140-6736(86)90837-8.

Borkin, Michelle, Zoya Bylinskii, Nam Wook Kim, Constance May Bainbridge, Chelsea Yeh, Daniel Borkin, Hanspeter Pfister, and Aude Oliva. 2015. “Beyond Memorability: Visualization Recognition and Recall.” IEEE Transactions on Visualization and Computer Graphics 22 (1): 519–28. https://doi.org/10.1109/TVCG.2015.2467732.

Bowley, Arthur Lyon. 1901. Elements of Statistics. London: P. S. King.

Briggs, Ryan. 2021. “Why Does Aid Not Target the Poorest?” International Studies Quarterly 65 (3): 739–52. https://doi.org/10.1093/isq/sqab035.

Bronner, Laura. 2021. “Quantitative Editing.” YouTube, June. https://youtu.be/LI5m9RzJgWc.

Brontë, Charlotte. 1857. The Professor. https://www.gutenberg.org/files/1028/1028-h/1028-h.htm.

Bueno de Mesquita, Ethan, and Anthony Fowler. 2021. Thinking Clearly with Data: A Guide to Quantitative Reasoning and Analysis. New Jersey: Princeton University Press.

Cahill, Niamh, Michelle Weinberger, and Leontine Alkema. 2020. “What Increase in Modern Contraceptive Use Is Needed in FP2020 Countries to Reach 75% Demand Satisfied by 2030? An Assessment Using the Accelerated Transition Method and Family Planning Estimation Model.” Gates Open Research 4. https://doi.org/10.12688/gatesopenres.13125.1.

Caro, Robert. 2019. Working. 1st ed. New York: Knopf.

Carroll, Lewis. 1871. Through the Looking-Glass. Macmillan. https://www.gutenberg.org/files/12/12-h/12-h.htm.

Castro, Marcia, Susie Gurzenda, Cassio Turra, Sun Kim, Theresa Andrasfay, and Noreen Goldman. 2023. “Research Note: COVID-19 Is Not an Independent Cause of Death.” Demography, February. https://doi.org/10.1215/00703370-10575276.

Chambliss, Daniel. 1989. “The Mundanity of Excellence: An Ethnographic Report on Stratification and Olympic Swimmers.” Sociological Theory 7 (1): 70–86. https://doi.org/10.2307/202063.

Chrétien, Jean. 2007. My Years as Prime Minister. 1st ed. Toronto: Knopf Canada.

Cleveland, William. (1985) 1994. The Elements of Graphing Data. 2nd ed. New Jersey: Hobart Press.

Craiu, Radu. 2019. “The Hiring Gambit: In Search of the Twofer Data Scientist.” Harvard Data Science Review 1 (1). https://doi.org/10.1162/99608f92.440445cb.

Doll, Richard, and Bradford Hill. 1950. “Smoking and Carcinoma of the Lung.” British Medical Journal 2 (4682): 739–48. https://doi.org/10.1136/bmj.2.4682.739.

Efron, Bradley, and Carl Morris. 1977. “Stein’s Paradox in Statistics.” Scientific American 236 (May): 119–27. https://doi.org/10.1038/scientificamerican0577-119.

Farrugia, Patricia, Bradley Petrisor, Forough Farrokhyar, and Mohit Bhandari. 2010. “Research Questions, Hypotheses and Objectives.” Canadian Journal of Surgery 53 (4): 278.

Fiske, Susan, and Shiro Kuriwaki. 2021. “Words to the Wise on Writing Scientific Papers,” November. https://doi.org/10.31234/osf.io/n32qw.

Fourcade, Marion, and Kieran Healy. 2017. “Seeing Like a Market.” Socio-Economic Review 15 (1): 9–29. https://doi.org/10.1093/ser/mww033.

Franklin, Laura. 2005. “Exploratory Experiments.” Philosophy of Science 72 (5): 888–99. https://doi.org/10.1086/508117.

Frei, Christoph, and Liam Welsh. 2022. “How the Closure of a U.S. Tax Loophole May Affect Investor Portfolios.” Journal of Risk and Financial Management 15 (5): 209. https://doi.org/10.3390/jrfm15050209.

Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. 1st ed. Cambridge University Press.

Gelman, Andrew, Cristian Pasarica, and Rahul Dodhia. 2002. “Let’s Practice What We Preach: Turning Tables into Graphs.” The American Statistician 56 (2): 121–30. https://doi.org/10.1198/000313002317572790.

Gerring, John. 2012. “Mere Description.” British Journal of Political Science 42 (4): 721–46. https://doi.org/10.1017/s0007123412000130.

Graham, Paul. 2020. “How to Write Usefully,” February. http://paulgraham.com/useful.html.

Gustafsson, Karl, and Linus Hagström. 2017. “What Is the Point? Teaching Graduate Students How to Construct Political Science Research Puzzles.” European Political Science 17 (4): 634–48. https://doi.org/10.1057/s41304-017-0130-y.

Hayot, Eric. 2014. The Elements of Academic Style. New York: Columbia University Press.

Hernán, Miguel, and James Robins. 2023. What If. 1st ed. Boca Raton: Chapman & Hall/CRC. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/.

Hug, Lucia, Monica Alexander, Danzhen You, Leontine Alkema, and UN Inter-agency Group for Child. 2019. “National, Regional, and Global Levels and Trends in Neonatal Mortality Between 1990 and 2017, with Scenario-Based Projections to 2030: A Systematic Analysis.” Lancet Global Health 7 (6): e710–20. https://doi.org/10.1016/S2214-109X(19)30163-9.

Hughes, Nicola, and Jill Rutter. 2016. “Ministers Reflect: Interview with Oliver Letwin,” December. https://www.instituteforgovernment.org.uk/ministers-reflect/person/oliver-letwin/.

Hulley, Stephen, Steven Cummings, Warren Browner, Deborah Grady, and Thomas Newman. 2007. Designing Clinical Research. 3rd ed. Lippincott Williams & Wilkins.

Iannone, Richard. 2022. DiagrammeR: Graph/Network Visualization. https://CRAN.R-project.org/package=DiagrammeR.

Joyner, Michael. 1991. “Modeling: Optimal Marathon Performance on the Basis of Physiological Factors.” Journal of Applied Physiology 70 (2): 683–87. https://doi.org/10.1152/jappl.1991.70.2.683.

Kahan, Brennan, Suzie Cro, Fan Li, and Michael Harhay. 2023. “Eliminating Ambiguous Treatment Effects Using Estimands.” American Journal of Epidemiology, February. https://doi.org/10.1093/aje/kwad036.

Kahan, Brennan, Joanna Hindley, Mark Edwards, Suzie Cro, and Tim Morris. 2024. “The estimands framework: a primer on the ICH E9(R1) addendum.” BMJ, January, e076316. https://doi.org/10.1136/bmj-2023-076316.

Kahan, Brennan, Fan Li, Andrew Copas, and Michael Harhay. 2022. “Estimands in Cluster-Randomized Trials: Choosing Analyses That Answer the Right Question.” International Journal of Epidemiology, July. https://doi.org/10.1093/ije/dyac131.

Kasy, Maximilian, and Alexander Teytelboym. 2023. “Matching with Semi-Bandits.” The Econometrics Journal 26 (1): 45–66. https://doi.org/10.1093/ectj/utac021.

Kennedy, Lauren, and Andrew Gelman. 2021. “Know Your Population and Know Your Model: Using Model-Based Regression and Poststratification to Generalize Findings Beyond the Observed Sample.” Psychological Methods 26 (5): 547–58. https://doi.org/10.1037/met0000362.

Keshav, Srinivasan. 2007. “How to Read a Paper.” ACM SIGCOMM Computer Communication Review 37 (3): 83–84. https://doi.org/10.1145/1273445.1273458.

Kharecha, Pushker, and James Hansen. 2013. “Prevented Mortality and Greenhouse Gas Emissions from Historical and Projected Nuclear Power.” Environmental Science & Technology 47 (9): 4889–95. https://doi.org/10.1021/es3051197.

Kiang, Mathew, Alexander Tsai, Monica Alexander, David Rehkopf, and Sanjay Basu. 2021. “Racial/Ethnic Disparities in Opioid-Related Mortality in the USA, 1999–2019: The Extreme Case of Washington DC.” Journal of Urban Health 98 (5): 589–95. https://doi.org/10.1007/s11524-021-00573-8.

King, Gary. 2006. “Publication, Publication.” PS: Political Science & Politics 39 (1): 119–25. https://doi.org/10.1017/S1049096506060252.

King, Stephen. 2000. On Writing: A Memoir of the Craft. 1st ed. Scribner.

Koenker, Roger, and Achim Zeileis. 2009. “On Reproducible Econometric Research.” Journal of Applied Econometrics 24 (5): 833–47. https://doi.org/10.1002/jae.1083.

Lamott, Anne. 1994. Bird by Bird: Some Instructions on Writing and Life. Anchor Books.

Latour, Bruno. 1996. “On Actor-Network Theory: A Few Clarifications.” Soziale Welt 47 (4): 369–81. http://www.jstor.org/stable/40878163.

Light, Richard, Judith Singer, and John Willett. 1990. By Design: Planning Research on Higher Education. 1st ed. Cambridge: Harvard University Press.

Little, Roderick, and Roger Lewis. 2021. “Estimands, Estimators, and Estimates.” JAMA 326 (10): 967. https://doi.org/10.1001/jama.2021.2886.

Lucas, Robert. 1978. “Asset Prices in an Exchange Economy.” Econometrica 46 (6): 1429–45. https://doi.org/10.2307/1913837.

Lundberg, Ian, Rebecca Johnson, and Brandon Stewart. 2021. “What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory.” American Sociological Review 86 (3): 532–65. https://doi.org/10.1177/00031224211004187.

McElreath, Richard. (2015) 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. 2nd ed. Chapman; Hall/CRC.

McPhee, John. 2017. Draft No. 4. 1st ed. Farrar, Straus; Giroux.

Meng, Xiao-Li. 1994. “Multiple-Imputation Inferences with Uncongenial Sources of Input.” Statistical Science 9 (4): 538–58. https://doi.org/10.1214/ss/1177010269.

———. 2012. “You Want Me to Analyze Data i Don’t Have? Are You Insane?” Shanghai Archives of Psychiatry 24 (5): 297–301. https://doi.org/10.3969/j.issn.1002-0829.2012.05.011.

———. 2018. “Statistical Paradises and Paradoxes in Big Data (i): Law of Large Populations, Big Data Paradox, and the 2016 US Presidential Election.” The Annals of Applied Statistics 12 (2): 685–726. https://doi.org/10.1214/18-AOAS1161SF.

Mok, Lillio, Samuel Way, Lucas Maystre, and Ashton Anderson. 2022. “The Dynamics of Exploration on Spotify.” In Proceedings of the International AAAI Conference on Web and Social Media, 16:663–74. https://doi.org/10.1609/icwsm.v16i1.19324.

Orwell, George. 1946. Politics and the English Language. https://www.orwellfoundation.com/the-orwell-foundation/orwell/essays-and-other-works/politics-and-the-english-language/.

Pineau, Joelle, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d’Alché-Buc, Emily Fox, and Hugo Larochelle. 2021. “Improving Reproducibility in Machine Learning Research (a Report from the NeurIPS 2019 Reproducibility Program).” Journal of Machine Learning Research 22 (164): 1–20. http://jmlr.org/papers/v22/20-303.html.

Rosenau, James N. 1999. “A Transformed Observer in a Transforming World.” Studia Diplomatica 52 (1/2): 5–14. http://www.jstor.org/stable/44838096.

Samuel, Arthur. 1959. “Some Studies in Machine Learning Using the Game of Checkers.” IBM Journal of Research and Development 3 (3): 210–29. https://doi.org/10.1147/rd.33.0210.

Savage, Van, and Pamela Yeh. 2019. “Novelist Cormac McCarthy’s Tips on How to Write a Great Science Paper.” Nature 574 (7778): 441–42. https://doi.org/10.1038/d41586-019-02918-5.

Sen, Amartya. 1980. “Description as Choice.” Oxford Economic Papers 32 (3): 353–69. https://doi.org/10.1093/oxfordjournals.oep.a041484.

Sides, John, Lynn Vavreck, and Christopher Warshaw. 2021. “The Effect of Television Advertising in United States Elections.” American Political Science Review, 1–17. https://doi.org/10.1017/s000305542100112x.

Smith, Jessie, Saleema Amershi, Solon Barocas, Hanna Wallach, and Jennifer Wortman Vaughan. 2022. “REAL ML: Recognizing, Exploring, and Articulating Limitations of Machine Learning Research.” 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). https://doi.org/10.1145/3531146.3533122.

Student. 1908. “The Probable Error of a Mean.” Biometrika 6 (1): 1–25. https://doi.org/10.2307/2331554.

The Economist. 2013. “Johnson: Those Six Little Rules: George Orwell on Writing,” July. https://www.economist.com/prospero/2013/07/29/johnson-those-six-little-rules.

Tolley, Erin, and Mireille Paquet. 2021. “Gender, Municipal Party Politics, and Montreal’s First Woman Mayor.” Canadian Journal of Urban Research 30 (1): 40–52. https://cjur.uwinnipeg.ca/index.php/cjur/article/view/323.

Touvron, Hugo, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, et al. 2023. “LLaMA: Open and Efficient Foundation Language Models.” arXiv. https://doi.org/10.48550/ARXIV.2302.13971.

Wardrop, Robert. 1995. “Simpson’s Paradox and the Hot Hand in Basketball.” The American Statistician 49 (1): 24–28. https://doi.org/10.2307/2684806.

Wickham, Hadley, Mara Averick, Jenny Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Xie, Yihui. 2023. knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.

Zinsser, William. 1976. On Writing Well. New York: HarperCollins.

While there is sometimes a need for a separate literature review section, another approach is to discuss relevant literature throughout the paper as appropriate. For instance, when there is literature relevant to the data then it should be discussed in this section, while literature relevant to the model, results, or discussion should be mentioned as appropriate in those sections.↩︎