CommunityData:Keeping Track of Metadata

From CommunityData

It's one thing to have a file full of data -- but where did the data come from? What process did you go through to get it? If you're tracking a lot of data and sources, how can your whole process be reproduced?

Ways to track metadata:

  • a file in a smart location, called something useful like ABOUT.txt or README.txt where you describe it
  • fetch files by scripting instead of by hand (even if it's just a dozen clicks, what if you have to do those dozen clicks 15 times? Now imagine it's two years later and a deadline is looming and Reviewer 2 doesn't like how you've cited your data source -- which dozen clicks did you do, all those many moons ago?) You can use a list of URLs as input to wget! wget -i <myGroovySourcesAreAllInHere.txt>
  • yaml files -- these are a simple text format that can be read in programmatically and are easy to maintain by hand. There are a lot of tutorials online about making yaml files, [this one from RedHat is just one of them].