CommunityData:Keeping Track of Metadata: Difference between revisions
From CommunityData
(Created page with "It's one thing to have a file full of data -- but where did the data come from? What process did you go through to get it? If you're tracking a lot of data and sources, how can your whole process be reproduced? Ways to track metadata: * a file in a smart location, called something useful like ABOUT.txt or README.txt where you describe it * fetch files by scripting instead of by hand (even if it's just a dozen clicks, what if you have to do those dozen clicks 15 time...") |
No edit summary |
||
Line 2: | Line 2: | ||
Ways to track metadata: | Ways to track metadata: | ||
* a file in a smart location, called something useful like ABOUT.txt or README.txt where you describe it | |||
* fetch files by scripting instead of by hand (even if it's just a dozen clicks, what if you have to do those dozen clicks 15 times? Now imagine it's two years later and a deadline is looming and Reviewer 2 doesn't like how you've cited your data source -- which dozen clicks did you do, all those many moons ago?) You can use a list of URLs as input to wget! wget -i <myGroovySourcesAreAllInHere.txt> | |||
* yaml files -- these are a simple text format that can be read in programmatically and are easy to maintain by hand. There are a lot of tutorials online about making yaml files, [[https://www.redhat.com/en/topics/automation/what-is-yaml this one from RedHat is just one of them]]. |
Latest revision as of 06:28, 5 February 2024
It's one thing to have a file full of data -- but where did the data come from? What process did you go through to get it? If you're tracking a lot of data and sources, how can your whole process be reproduced?
Ways to track metadata:
- a file in a smart location, called something useful like ABOUT.txt or README.txt where you describe it
- fetch files by scripting instead of by hand (even if it's just a dozen clicks, what if you have to do those dozen clicks 15 times? Now imagine it's two years later and a deadline is looming and Reviewer 2 doesn't like how you've cited your data source -- which dozen clicks did you do, all those many moons ago?) You can use a list of URLs as input to wget! wget -i <myGroovySourcesAreAllInHere.txt>
- yaml files -- these are a simple text format that can be read in programmatically and are easy to maintain by hand. There are a lot of tutorials online about making yaml files, [this one from RedHat is just one of them].