CommunityData:Keeping Track of Metadata: Difference between revisions

Latest revision as of 06:28, 5 February 2024

It's one thing to have a file full of data -- but where did the data come from? What process did you go through to get it? If you're tracking a lot of data and sources, how can your whole process be reproduced?

Ways to track metadata:

a file in a smart location, called something useful like ABOUT.txt or README.txt where you describe it
fetch files by scripting instead of by hand (even if it's just a dozen clicks, what if you have to do those dozen clicks 15 times? Now imagine it's two years later and a deadline is looming and Reviewer 2 doesn't like how you've cited your data source -- which dozen clicks did you do, all those many moons ago?) You can use a list of URLs as input to wget! wget -i <myGroovySourcesAreAllInHere.txt>
yaml files -- these are a simple text format that can be read in programmatically and are easy to maintain by hand. There are a lot of tutorials online about making yaml files, [this one from RedHat is just one of them].

@@ Line 2: / Line 2: @@
 Ways to track metadata:
-  * a file in a smart location, called something useful like ABOUT.txt or README.txt where you describe it
+* a file in a smart location, called something useful like ABOUT.txt or README.txt where you describe it
-  * fetch files by scripting instead of by hand (even if it's just a dozen clicks, what if you have to do those dozen clicks 15 times? Now imagine it's two years later and a deadline is looming and Reviewer 2 doesn't like how you've cited your data source -- which dozen clicks did you do, all those many moons ago?) You can use a list of URLs as input to wget! wget -i <myGroovySourcesAreAllInHere.txt>
+* fetch files by scripting instead of by hand (even if it's just a dozen clicks, what if you have to do those dozen clicks 15 times? Now imagine it's two years later and a deadline is looming and Reviewer 2 doesn't like how you've cited your data source -- which dozen clicks did you do, all those many moons ago?) You can use a list of URLs as input to wget! wget -i <myGroovySourcesAreAllInHere.txt>
-  * yaml files -- these are a simple text format that can be read in programmatically and are easy to maintain by hand. There are a lot of tutorials online about making yaml files, [[https://www.redhat.com/en/topics/automation/what-is-yaml this one from RedHat is just one of them]].
+* yaml files -- these are a simple text format that can be read in programmatically and are easy to maintain by hand. There are a lot of tutorials online about making yaml files, [[https://www.redhat.com/en/topics/automation/what-is-yaml this one from RedHat is just one of them]].