CommunityData:Build papers

From CommunityData
Revision as of 23:57, 10 June 2019 by Jdfoote (talk | contribs) (Reworking this page to (hopefully) be more helpful.)

When creating LaTeX documents, the final PDF output must be built from an input file. For many of our projects, this process is even more complicated and we use Makefiles to manage more complex workflows. This document is intended to give an overview of the basic process and to identify good practices for quantitative projects.

Project Creation

Follow the directions for installing the LaTeX templates. Once they are installed, you can run something like:

   cd ~/Projects
   new_knitr_document project_name

This will create a new project called project_name.

This project will include a Makefile and all of the pieces needed to create a document.


The next step is to set up your references. Often, this means creating a new directory in Zotero. Follow the directions on the CommunityData:Zotero page. You should export your Zotero directory as Better BibLaTeX to the refs.bib file, and check "Keep updated" for Zotero to automatically update that file whenever the directory changes.

Making the paper

  1. The first time you're building the paper, you can just run make or make all. After that, you probably want to run make clean; make all. This should work whether you're using an .Rtex (knitr) or .tex (LaTeX) file.


At this point, you should put your project into git. See CommunityData:Git for instructions.

ShareLatex adaptation

If you're using ShareLatex for your papers, that's great. The Best Practice is still to update the bibliography in Zotero, then export from there. Ideally, export into refs.bib and then upload that to ShareLatex so that the paper directory in our shared git repositories are up-to-date!

Style notes and more details (add as needed)

  • The typical directory structure that I use is something like:
       analyzed_data.RData (symlinked)
     /figures (any figures not created with knitr)

The code all lives in one place, with a README that explains what the pieces do. Don't underestimate the importance of a README and well-commented code. Your future self will thank you.

The data lives in another place, with raw_data which is read-only (and may or may not be stored in git, depending on the size). This raw data is used to create measures (and possibly some intermediate files). Analysis is done on the measures file, and stored in an RData file.

It is tempting to put as much of the analysis workflow as possible into the .Rtex file but these files can quickly become difficult to compile and unwieldy. It's often better to do the heavy lifting in a separate script, import an .RData file in the paper, and do things like creating plots and running simple statistical tests in the paper itself.

  • You may need to use natbib for some journal submission styles. The stackexchange page linked above helps to explain how to do that (but if anyone wants to give more detailed instructions, please do!).