CommunityData:Automating and Streamlining Walkthrough

From CommunityData

Welcome to the Automating and Streamlining Walkthrough!

This guide steps you through why and how you might like to adopt some of our tips and tricks around automating and streamlining your research workflow. This guide is opinionated---you might not find all of it useful---but this advice is built from lived experience: these are strategies CDSC members use to keep away from certain kinds of annoyances, traps, and mistakes. Some of what's described here will be a lot easier if you have completed the CommunityData:Onboarding Checklist. CDSC members will want to make sure they have a fresh copy of the cdsc_examples git repository.

Staying Organized

The kiddie cartoon version of the scientific method describes a linear and rather sterile process from hypothesis to experiment to knowledge---the reality looks a lot messier. We rummage around, scratch our heads, wander down dark alleys, scrape, crunch, gather....and then we look around at all the beautiful mess we've made and try to turn it into a paper: write and re-write, submit, revise, re-submit, re-revise, re-re-submit --- and then maybe a year or two later, we're announcing, releasing, publishing, and presenting. Keeping track of the weird and wild ride can be tremendously helpful when it comes time to write it up and tell the world what you've learned.

  1. Take notes for yourself as you go along---think of it as keeping a lab notebook. Use whatever works for you, but make sure it is:
    1. Searchable
    2. Backed up
    3. Filled with all kinds of context. Copy in commands, URLs, wild ideas, things you tried and why. It can be therapy, nobody but you will read it.
  2. Don't let data-related details fall through the cracks. Keep track of metadata because future you will forget.

People use quite different tools and applications for this purpose. Flat-files (.txt or .org or .md) are often handy. Literate programming systems are also handy (Jupiter notebook, R-Markdown). Others like dedicated note-taking systems like Obsidian. Talk to others in the group to learn more about any of these options and more.

Automating

Automation can be extremely helpful, but it's an investment. You will not regret time spent on modest automation, in particular if you do computational work. You never want to be in the position of copy-pasting from R into LaTeX or Word: it is error prone and when you later find yourself needing to revise, you might not remember where you got that number from ....... instead, what you want is some automation magic, so that every time you run your R code, your new data and fresh visualizations land in your Overleaf. Example code for making this work is in the cdsc_examples/R_examples/automation git repository. If you want to learn even more, there's a little tutorial on Knitr, and here's a more expanded guide.

  1. Set up this marvelous R - Overleaf automation
  2. Google sheets downloading automation
  3. Auto-populating URLs in Google sheets to streamline content analysis

Building from Prior Efforts

You can save yourself a lot of time and confusion if you build from the work that others have done in the lab to streamline and automate their work.

Build your bibliography using our shared Zotero --- if you have Dropbox wired up to your Overleaf, that means one or zero clicks to keep your bibliography up to date. Don't paste citation formatting straight from the web into your Overleaf or hand-format your citations, that way leads only to tears and the gnashing of teeth!

Our collection of LaTeX templates is helpful for making nice documents.

Revise and Resubmit

Revise and resubmit makes papers better but can be a time crunch and if you have co-authors, it can get hard to tell what's been done or what certain revisions were trying to accomplish. When you get reviews back that ask you to make changes, you can save yourself a lot of annoyance if you follow this four-part method:

  1. Keep a copy of your original submission, including the main.tex file (if you are using .Rtex file because you followed the above advice about automation....please note that the main.tex file is hidden in Overleaf -- click the little 'document' icon next to 'Recompile' (mouseover has a tooltip that says 'Logs and output files' -- then 'Other logs and files' at the bottom right -- note that the output.bbl file is there too, that's also useful because you'll want it for your upload to arXiv if you post a preprint). You will need this for the LaTeX Diff below. This is the first stage of responding to a review---but it's one you can do even before the reviews are back.
  2. Paste all reviews into a Google Sheet---feel free to use this as a template. This is useful for the second stage of responding to a review, when maybe you're working through some emotional responses, trying to figure out what to do, etc.
  3. Paste all reviews into a response to reviewers letter document in the same Overleaf project where you wrote the paper. Place them below the /end{document} line so they won't show up. This is useful for the third stage of responding to a review---actually revising. As you revise to address the issue, move the comment up into the letter, quote it and then write what you did. Mark and comment the bit of your paper you copy-pasted (something like: populate any edits made into reviewer response). It might feel a skosh redundant with the Google Sheet, and it is ... think of it as double-entry bookkeeping to make sure nothing gets dropped.
  4. Show your work: some venues require (and reviewers often appreciate) a document that shows what was changed. If you use Word, that means turning on Track Changes. If you use LaTeX, latexdiff is a way to achieve this. Latexdiff is available on CommunityData:Kibo. Making sure you have a nice clean diff and letter is the final stage of responding to a review.

Please note that the cdsc_example repository contains examples of past revision processes.