CommunityData:Automating and Streamlining Walkthrough

From CommunityData

Welcome to the Automation and Streamlining Walkthrough!

This guide steps you through why and how you might like to adopt some of our tips and tricks around automating and streamlining your research workflow. Our questions and methods lead to a couple of challenges and complexities. These are strategies we use to keep away from certain kinds of annoyances, traps, and mistakes. Some of what's described here will be a lot easier if you have completed the CommunityData:Onboarding Checklist. CDSC members will want to make sure they have a fresh copy of the cdsc_examples git repository.

Staying Organized

The kiddie cartoon version of the scientific method describes a linear and rather sterile process from hypothesis to experiment to insight -- the reality looks a lot messier. We rummage around, scratch our heads, wander down dark alleys, think and re-think, scrape, crunch, gather....and then we look around at all the beautiful mess we've made and try to turn it into a paper: write and re-write, submit, revise, re-submit, re-revise, re-resubmit --- and then maybe a year or two later, we're announcing, releasing, publishing and presenting. Keeping track of the weird and wild ride can be tremendously helpful when it comes time to write it up and tell the world what you've learned.

  1. Take notes for yourself as you go along -- think of it as keeping a lab notebook. Use whatever works for you, but make sure it is:
    1. Searchable
    2. Backed up
    3. Filled with all kinds of context. Copy in commands, URLs, wild ideas, things you tried and why.
  2. Don't let data-related details fall through the cracks. Keep track of metadata because future you will forget.

Automating

Automation can be extremely helpful, but it's an investment. You will not regret time spent on modest automation, in particular if you do computational work. You never want to be in the position of copy-pasting from R into LaTeX or Word: it is error prone and when you later find yourself needing to revise, you might not remember where you got that number from ....... instead, what you want is some automation magic, so that every time you run your R code, your new data and fresh visualizations land in your Overleaf. Example code for making this work is in the cdsc_examples/R_examples/automation git repository. If you want to learn even more, there's a little tutorial on Knitr, and here's a more expanded guide.

  1. Set up this marvelous R - Overleaf automation
  2. Google sheets downloading automation
  3. Auto-populating URLs in Google sheets to streamline content analysis

Building from Prior Efforts

You can save yourself a lot of time and confusion if you build from the work that others have done in the lab to streamline and automate their work.

Build your bibliography using our shared Zotero --- if you have Dropbox wired up to your Overleaf, that means one or zero clicks to keep your bibliography up to date. Don't paste citation formatting straight from the web into your Overleaf or hand-format your citations, that way leads only to tears and the gnashing of teeth!

Our collection of LaTeX templates is helpful for making nice documents.

Revise and Resubmit

Revise and resubmit makes papers better but can be a time crunch and if you have co-authors, it can get hard to tell what's been done or what certain revisions were trying to accomplish. When you get reviews back that ask you to make changes, you can save yourself a lot of annoyance if you follow this four-part method:

  1. Keep a copy of your original submission, including the main.tex file (if you are using .Rtex file because you followed the above advice about automation....please note that the main.tex file is hidden in Overleaf -- click the little 'document' icon next to 'Recompile' (mouseover has a tooltip that says 'Logs and output files' -- then 'Other logs and files' at the bottom right -- note that the output.bbl file is there too, that's also useful because you'll want it for your upload to arXiv if you post a preprint). You will need this for the LaTeX Diff below.
  2. Paste all your reviews into a Google Sheet -- feel free to use this as a template
  3. Also paste all your reviews into a response to reviewers letter in the same Overleaf project where you wrote the paper. Place them below the /end{document} line so it won't show up.. As you revise to address the issue, move the comment up into the letter, quote it and then write what you did. It might feel a skosh redundant with the Google Sheet, and it is ... think of it as double-entry bookkeeping to make sure nothing gets dropped.
  4. Show your work: some venues require (and reviewers often appreciate) a document that shows what was changed. If you use Word, that means turning on Track Changes. If you use LaTeX, latexdiff is a way to achieve this. Latexdiff is available on CommunityData:Kibo.

Please note that the cdsc_example repository contains examples of past revision processes.