CommunityData:Keeping Track of Metadata: Difference between revisions

From CommunityData
(Created page with "It's one thing to have a file full of data -- but where did the data come from? What process did you go through to get it? If you're tracking a lot of data and sources, how can your whole process be reproduced? Ways to track metadata: * a file in a smart location, called something useful like ABOUT.txt or README.txt where you describe it * fetch files by scripting instead of by hand (even if it's just a dozen clicks, what if you have to do those dozen clicks 15 time...")
 
No edit summary
 
Line 2: Line 2:


Ways to track metadata:
Ways to track metadata:
  * a file in a smart location, called something useful like ABOUT.txt or README.txt where you describe it
* a file in a smart location, called something useful like ABOUT.txt or README.txt where you describe it
  * fetch files by scripting instead of by hand (even if it's just a dozen clicks, what if you have to do those dozen clicks 15 times? Now imagine it's two years later and a deadline is looming and Reviewer 2 doesn't like how you've cited your data source -- which dozen clicks did you do, all those many moons ago?) You can use a list of URLs as input to wget! wget -i <myGroovySourcesAreAllInHere.txt>
* fetch files by scripting instead of by hand (even if it's just a dozen clicks, what if you have to do those dozen clicks 15 times? Now imagine it's two years later and a deadline is looming and Reviewer 2 doesn't like how you've cited your data source -- which dozen clicks did you do, all those many moons ago?) You can use a list of URLs as input to wget! wget -i <myGroovySourcesAreAllInHere.txt>
  * yaml files -- these are a simple text format that can be read in programmatically and are easy to maintain by hand. There are a lot of tutorials online about making yaml files, [[https://www.redhat.com/en/topics/automation/what-is-yaml this one from RedHat is just one of them]].
* yaml files -- these are a simple text format that can be read in programmatically and are easy to maintain by hand. There are a lot of tutorials online about making yaml files, [[https://www.redhat.com/en/topics/automation/what-is-yaml this one from RedHat is just one of them]].

Latest revision as of 08:28, 5 February 2024

It's one thing to have a file full of data -- but where did the data come from? What process did you go through to get it? If you're tracking a lot of data and sources, how can your whole process be reproduced?

Ways to track metadata:

  • a file in a smart location, called something useful like ABOUT.txt or README.txt where you describe it
  • fetch files by scripting instead of by hand (even if it's just a dozen clicks, what if you have to do those dozen clicks 15 times? Now imagine it's two years later and a deadline is looming and Reviewer 2 doesn't like how you've cited your data source -- which dozen clicks did you do, all those many moons ago?) You can use a list of URLs as input to wget! wget -i <myGroovySourcesAreAllInHere.txt>
  • yaml files -- these are a simple text format that can be read in programmatically and are easy to maintain by hand. There are a lot of tutorials online about making yaml files, [this one from RedHat is just one of them].