Structure of a quantitative empirical research paper

From CommunityData

Organization[edit]

Most quantitative empirical research projects have a similar structure that includes a very similar set of sections. These are detailed below. Requirements in terms of formatting will usually vary between journals (e.g., APA6, Chicago, MLA, etc). In the Social Sciences, APA6 seems to be the most common.

Front matter[edit]

Your front matter is not a formal section but a collection of material that comes before your paper. Front matter usually includes:

  • A titlepage that includes:
    • The full title of the paper.
    • The name, affiliation, and email of each author.
    • A note (often in a footnote) thanking others for support.
  • An abstract, usually between 150-250 words.

Introduction[edit]

Your introduction should be short: not more than 2-3 pages and 5-6 paragraphs. Your introduction should only seek to do three things (with an optional fourth):

  1. Introduce and motivate your work. What is the topic of this research? Why is this research worth pursuing?
  2. Establish the importance, relevance, and impact of your work (what is the research question? why is the question important? what data/methods does the paper use to answer the question?) providing a clear answer to the question, "Why should a reader care?"
  3. Foreshadow the key findings and contributions of the study. What do we know now that we did not know before?
  4. (Optional) In the final paragraph, lay out the organization of the rest of the paper.

Background[edit]

There is more general advice on the topic of writing an introduction and background section elsewhere on the wiki but, given a solid introduction that does its job, your background section should only needs to do two additional things:

  1. Define the terms you'll be using in your study.
  2. Build up the rationale for your hypotheses.

Critically, a background section is not a comprehensive literature review. Done well, it is nothing more than a coherent argument that presents your research questions and the rationale that lies behind them. That's it.

The background section should end with your hypothesis or hypotheses. If you have several distinct hypotheses, it might make sense to create subsections of your background for each hypothesis. Then you can end each subsection with the hypothesis itself.

Research Design[edit]

This section should present details of how you carried out your study. Usually, it will includes subsections that touch on each of the following items (although not every item needs it's own subsection, it's fine if they have them):

Empirical Setting
Use this section to describe the site of your research in detail and provide any important context.
Research Ethics
Describe any ethical issues related to this work. For example, this is a place to describe the process through which you got IRB approval to carry out your research. If your work does not require IRB approval, say this and explain how your work minimizes risks to the human subjects whose data is captured in your dataset.
Procedures
Describe the process that you used to collect your data. Detail choices you made along the way that include or excluded any data. This doesn't need to be a diary of everything you tried but it should be comprehensive enough for someone to reproduce your dataset given access to the same material and setting that you had.
Sample
Describe your sample. This will include the number of observations in your sample but also any other details or summary statistics that help us understand the nature of the sample you have collected.
Measures
Describe every variable you included in your analysis and model and describe how it was constructed and/or coded. It usually makes sense to start with dependent variables, then focus on question predictors, and finally talk about control variables. Often it makes sense to use a table to organize this information. This section must establish your variable names, your variable definitions, and your value codings. This is an appropriate place to include your tables of univariate and bivariate statistics for all of the variables in your model.
Analytic Plan
The analytic plan should detail all the of the analyses that you performed. You should mention what type of model you used and you should explain why you believe it is the appropriate method. Typically, this includes specifying the regression model that you've used. You should include the regression equation (e.g. ).

Findings[edit]

With good preparation a findings section can be very short.

My findings sections usually:

  • Refer to a table with the results from my fitted models.
  • Interpret the coefficients in the models directly from the tables. I usually mention the estimates as well as the standard errors and t-statistics and p-values.
  • Next, I described predicted values from the model for hypothetical (often called "prototypical") observations. I frequently hold all values at their sample median and vary only the key question predictor by a reasonable amount (e.g., from the 25th quartile to the 75th quartile). If I plug those values into my model, what are the results?
  • I usually include a visualization or graph of model predicted values.
  • Finally, I usually try and include a paragraph that interprets the controls with at least reference to their signs. Were the controls effective? Did they point the way we expected. The controls are the controls, after all, so they aren't the main event. That said, show us that you paid attention to them at least.

As you write the section, walk folks through the substantive takeaways from your results. Explain how these results support, or provide evidence that fails to support, your hypotheses. Be very explicit.

Threats to Validity / Limitations[edit]

Every study has limitations and important threats to your validity. It's your job to describe all the way that your results are contingent. In particular, make sure you discuss:

Threats to internal validity
Why might we doubt the results of this work? What assumptions that underly your results may not hold? Why not? What are the threats to construct validity that underly your analysis?
Threats to external validity
Explain why your work might fail to generalize to other empirical settings or samples?

To the extent that you can present evidence, additional analyses, or robustness checks that address these concerns, that's great. To the extent that some of these concerns will be left on the table, it's better for you to foreground these here.

Explain why, even with important threats and limitations, you think your work still makes an important contribution.

Discussion[edit]

  • Summarize your findings.
  • Connect back to your background and the initial rationale described in the front-end of your paper.
  • Discuss future research. Don't just say that future research is needed but explain, concretely, what particular future work would address the limitations of your work described in the previous section.

Bibliography[edit]

Straight-forward enough but read it carefully before you submit. Misspelled authors names don't seem like a huge deal but they can haunt you. The misspelled authors will notice.

Appendix[edit]

Nearly every journal allows you to have an online appendix.

Appendixes can include copy of instruments. longer descriptions of variables or a dataset or the process necessary to collect it, additional robustness checks and tables, commentary on specific analyses. These are unrestricted. Use them.

Not every journal allows you to submit these with the paper. If your journal doesn't, you can submit your appendix to FigShare which will create an archival version with a DOI that you can cite from your manuscript and which will be maintained by librariand archives going forward.

Tables[edit]

There are three types of tables that every every quantitative paper should include:

Univariate statistics
This should include 1-3 tables that describes the mean, median, standard deviation, and range of every variable in your analysis. If you have many categorical or dichotomous variables, you'll probably just want to show proportions and counts.
Bivariate statistics
In most cases, a simple triangular correlation table output from cor() is enough.
Regression/model results
This should should be the central piece of evidence presented in your paper. I like the tables produced by screenreg() (or really, texreg() and htmlreg()) in the texreg package in R. stargazer and apsrtable do something very similar.

Credit[edit]

Much of this material is drawn and adapted from John B. Willett's "Structure of a Scholarly Research Paper."