CommunityData:Keeping Track of Metadata

From CommunityData
Revision as of 08:28, 5 February 2024 by Kaylea (talk | contribs) (Created page with "It's one thing to have a file full of data -- but where did the data come from? What process did you go through to get it? If you're tracking a lot of data and sources, how can your whole process be reproduced? Ways to track metadata: * a file in a smart location, called something useful like ABOUT.txt or README.txt where you describe it * fetch files by scripting instead of by hand (even if it's just a dozen clicks, what if you have to do those dozen clicks 15 time...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

It's one thing to have a file full of data -- but where did the data come from? What process did you go through to get it? If you're tracking a lot of data and sources, how can your whole process be reproduced?

Ways to track metadata:

 * a file in a smart location, called something useful like ABOUT.txt or README.txt where you describe it
 * fetch files by scripting instead of by hand (even if it's just a dozen clicks, what if you have to do those dozen clicks 15 times? Now imagine it's two years later and a deadline is looming and Reviewer 2 doesn't like how you've cited your data source -- which dozen clicks did you do, all those many moons ago?) You can use a list of URLs as input to wget! wget -i <myGroovySourcesAreAllInHere.txt>
 * yaml files -- these are a simple text format that can be read in programmatically and are easy to maintain by hand. There are a lot of tutorials online about making yaml files, [this one from RedHat is just one of them].