CommunityData:Build papers: Difference between revisions

From CommunityData
(Created page with "== Basic steps (quick and dirty version) == # Export a bib file from Zotero into the directory where you're building your paper. Call the file <code>refs.bib</code> # The fir...")
 
(5 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== Basic steps (quick and dirty version) ==
When creating LaTeX documents, the final PDF output must be built from an input file. For many of our projects, this process is even more complicated and we use [https://en.wikipedia.org/wiki/Makefile Makefiles] to manage more complex workflows. This document is intended to give an overview of the basic process and to identify good practices for quantitative projects.


# Export a bib file from Zotero into the directory where you're building your paper. Call the file <code>refs.bib</code>
== Project Creation ==
# The first time you're building the paper, you can just run <code>make</code> or <code>make all</code>. After that, you probably want to run <code>make clean; make all</code>. This should work whether you're using an .Rnw (knitter) or .tex (LaTeX) file.
 
Follow the directions for installing the [[CommunityData:TeX|LaTeX templates]]. Once they are installed, you can run something like:
 
    cd ~/Projects
    new_knitr_document project_name
 
This will create a new project called <code>project_name</code>.
 
This project will include a Makefile and all of the pieces needed to create a document.
 
== Zotero ==
 
The next step is to set up your references. Often, this means creating a new directory in Zotero. Follow the directions on the [[CommunityData:Zotero]] page. You should export your Zotero directory as Better BibLaTeX to the refs.bib file, and check "Keep updated" for Zotero to automatically update that file whenever the directory changes.
 
== Making the paper ==
 
# The first time you're building the paper, you can just run <code>make</code> or <code>make all</code>. After that, you probably want to run <code>make clean; make all</code>. This should work whether you're using an .Rtex (knitr) or .tex (LaTeX) file.
 
== Git ==
 
At this point, you should put your project into git. See [[CommunityData:Git]] for instructions.
 
 
== Overleaf ==
 
If you're using Overleaf for your papers, that's great. Many of us use Overleaf at some point in our writing process. This section describes a reasonable workflow for putting papers on Overleaf.
 
=== Dropbox Sync ===
 
Overleaf is actually really good at allowing for a combination online/offline workflow, and keeping documents in sync. Most of us have this set up through Dropbox sync. In your Overleaf account settings you can set this up. Once you do that, all of your Overleaf projects will appear in <code>~/Dropbox/Apps/Overleaf/</code>.
 
=== Creating a Project ===
 
Overleaf supports the entire pipeline that we typically use when building TeX papers from a local computer, including knitr and Makefiles. Therefore, the best way to put a complicated project onto Overleaf is to create it on your computer first. You can do this by going to the <code>Overleaf</code> directory in Dropbox and using the <code>new_knitr_document</code> command. This will both give you a new document with all of the needed resources and will create a new project on Overleaf
 
=== Data and Code ===
 
You should not put your entire project on Overleaf. If you follow the suggested directory structure below, then Overleaf should just contain the <code>paper/</code> directory. The code and data should live somewhere else, typically.
 
The one problem with this is that you likely want the whole project, including the paper, in the same place, and updated in a git repository. One approach to this is to include an option in the Makefile which will copy the Dropbox version into the canonical directory.
 
 
=== The bibliography file ===
 
The Best Practice is still to use Zotero to manage your bibliography. If you have installed Better Bibtex (and if you haven't, then you should!), then you can export your bibliography and keep it updated. You right-click on the collection you want to export, choose the format (likely Better BibTex), and select "keep updated"
 
[[File:better_bibtex_export.png|Example of exporting from Zotero]]
 
You should export it into the <code>refs.bib</code> file in your Dropbox Overleaf project, and then Better Bibtex and Dropbox will keep it updated on Overleaf for you.


== Style notes and more details (add as needed) ==
== Style notes and more details (add as needed) ==


* For sanity, it's good to create sub-directories within the paper directory to store things like knitter data and figures. For most of our existing projects these sub-directories have informative names like <code>knitter_data</code> and <code>figures</code>.
* The typical directory structure that I use is something like:
* Don't edit <code>refs-processed.bib</code> by hand. This is a file that the Makefile builds every time it compiles the paper. If you have some reason to edit the bibliography by hand, edits <code>refs.bib</code>, but do so at your own risk since collaborators and other Community Data folks may come by your repository and try to build the paper by downloading a new bib file from Zotero unless you tell them otherwise!
 
    /code
      01_script.py
      02_script.py
      03_analysis.R
      README
    /data
      /raw_data
        raw_data_file.csv
      measures.csv
      analyzed_data.RData
    /paper
      /data
        analyzed_data.RData (symlinked)
      /figures (any figures not created with knitr)
      paper.Rtex
 
The code all lives in one place, with a README that explains what the pieces do. Don't underestimate the importance of a README and well-commented code. Your future self will thank you.
 
The data lives in another place, with raw_data which is read-only (and may or may not be stored in git, depending on the size). This raw data is used to create measures (and possibly some intermediate files). Analysis is done on the measures file, and stored in an RData file.
 
It is tempting to put as much of the analysis workflow as possible into the .Rtex file but these files can quickly become difficult to compile and unwieldy. It's often better to do the heavy lifting in a separate script, import an .RData file in the paper, and do things like creating plots and running simple statistical tests in the paper itself.
 
* You may need to use [https://tex.stackexchange.com/questions/25701/bibtex-vs-biber-and-biblatex-vs-natbib natbib] for some journal submission styles. The stackexchange page linked above helps to explain how to do that (but if anyone wants to give more detailed instructions, please do!).

Revision as of 21:54, 19 March 2020

When creating LaTeX documents, the final PDF output must be built from an input file. For many of our projects, this process is even more complicated and we use Makefiles to manage more complex workflows. This document is intended to give an overview of the basic process and to identify good practices for quantitative projects.

Project Creation

Follow the directions for installing the LaTeX templates. Once they are installed, you can run something like:

   cd ~/Projects
   new_knitr_document project_name

This will create a new project called project_name.

This project will include a Makefile and all of the pieces needed to create a document.

Zotero

The next step is to set up your references. Often, this means creating a new directory in Zotero. Follow the directions on the CommunityData:Zotero page. You should export your Zotero directory as Better BibLaTeX to the refs.bib file, and check "Keep updated" for Zotero to automatically update that file whenever the directory changes.

Making the paper

  1. The first time you're building the paper, you can just run make or make all. After that, you probably want to run make clean; make all. This should work whether you're using an .Rtex (knitr) or .tex (LaTeX) file.

Git

At this point, you should put your project into git. See CommunityData:Git for instructions.


Overleaf

If you're using Overleaf for your papers, that's great. Many of us use Overleaf at some point in our writing process. This section describes a reasonable workflow for putting papers on Overleaf.

Dropbox Sync

Overleaf is actually really good at allowing for a combination online/offline workflow, and keeping documents in sync. Most of us have this set up through Dropbox sync. In your Overleaf account settings you can set this up. Once you do that, all of your Overleaf projects will appear in ~/Dropbox/Apps/Overleaf/.

Creating a Project

Overleaf supports the entire pipeline that we typically use when building TeX papers from a local computer, including knitr and Makefiles. Therefore, the best way to put a complicated project onto Overleaf is to create it on your computer first. You can do this by going to the Overleaf directory in Dropbox and using the new_knitr_document command. This will both give you a new document with all of the needed resources and will create a new project on Overleaf

Data and Code

You should not put your entire project on Overleaf. If you follow the suggested directory structure below, then Overleaf should just contain the paper/ directory. The code and data should live somewhere else, typically.

The one problem with this is that you likely want the whole project, including the paper, in the same place, and updated in a git repository. One approach to this is to include an option in the Makefile which will copy the Dropbox version into the canonical directory.


The bibliography file

The Best Practice is still to use Zotero to manage your bibliography. If you have installed Better Bibtex (and if you haven't, then you should!), then you can export your bibliography and keep it updated. You right-click on the collection you want to export, choose the format (likely Better BibTex), and select "keep updated"

Example of exporting from Zotero

You should export it into the refs.bib file in your Dropbox Overleaf project, and then Better Bibtex and Dropbox will keep it updated on Overleaf for you.

Style notes and more details (add as needed)

  • The typical directory structure that I use is something like:
   /code
     01_script.py
     02_script.py
     03_analysis.R
     README
   /data
     /raw_data
       raw_data_file.csv
     measures.csv
     analyzed_data.RData
   /paper
     /data
       analyzed_data.RData (symlinked)
     /figures (any figures not created with knitr)
     paper.Rtex

The code all lives in one place, with a README that explains what the pieces do. Don't underestimate the importance of a README and well-commented code. Your future self will thank you.

The data lives in another place, with raw_data which is read-only (and may or may not be stored in git, depending on the size). This raw data is used to create measures (and possibly some intermediate files). Analysis is done on the measures file, and stored in an RData file.

It is tempting to put as much of the analysis workflow as possible into the .Rtex file but these files can quickly become difficult to compile and unwieldy. It's often better to do the heavy lifting in a separate script, import an .RData file in the paper, and do things like creating plots and running simple statistical tests in the paper itself.

  • You may need to use natbib for some journal submission styles. The stackexchange page linked above helps to explain how to do that (but if anyone wants to give more detailed instructions, please do!).