Latest revision |
Your text |
Line 1: |
Line 1: |
| The [https://dataverse.harvard.edu/ Harvard Dataverse] is an archive for datasets and code hosted by Harvard but available to anybody. The [https://dataverse.harvard.edu/dataverse/communitydata Community Data Science Collective Dataverse] a portal within the Harvard Dataverse that is for CDSC projects and that is managed by our team.
| | ==How should I use the CDSC dataverse?== |
|
| |
|
| ==How should I add things to the CDSC Dataverse?==
| | # Create an account; you'll likely use your institutional login for it |
| | | # Ask to join the CDSC group |
| If you have not done so before, '''everyone should begin by''':
| | # Create your replication package or dataset release |
| | |
| # [https://dataverse.harvard.edu/dataverseuser.xhtml;?editMode=CREATE&redirectPage=%2Fdataverse.xhtml%3Falias%3Dcommunitydata Create an account] You might want to use your institutional login for it or you can create a new one with your username/email and password. | |
| # Ask an existing administration to make you a member/administrator of the CDSC dataverse (everybody in the group should be an admin so it's best to ask on IRC). | |
| | |
| You now need to select between one of two choices: (1) The first is to create your dataset within the CDSC Dataverse. This is usually best because anyone in the group can manage it. (2) Th second is to create it outside of the Dataverse but to "link" it. You will typically do this when there is access-restricted data that should ''not'' be available to everyone in the group.
| |
| | |
| If you want to '''create a dataset in the CDSC Dataverse''' you should create your replication package or dataset release by:
| |
| | |
| # Go to the [https://dataverse.harvard.edu/dataverse/communitydata CDSC Dataverse Page]
| |
| # Click "+ Add Data" → "New Dataset"
| |
| # Make sure that "Host Dataverse" says "Community Data Science Collective Dataverse"
| |
| # Upload and fill out metadata fields (minimally, include a README.txt file to explain how to use your data and code) | | # Upload and fill out metadata fields (minimally, include a README.txt file to explain how to use your data and code) |
| # Publish/release! | | # Release! |
| | |
| If you want to '''create your dataset outside the CDSC Dataverse but have it listed''' you will need to:
| |
| | |
| # Go to the [https://dataverse.harvard.edu/ Main Harvard Dataverse Page]
| |
| # Click "Click" → "Add a Dataset"
| |
| # Make sure that "Host Dataverse" says "Harvard Dataverse"
| |
| # Upload and fill out metadata fields (minimally, include a README.txt file to explain how to use your data and code)
| |
| # Publish/release!
| |
| # Click the "Link Dataset" button on your dataset page and then type/select the ''Community Data Science Collective'' Dataverse.
| |
| | |
| Finally, '''if you have already created a dataset and want it moved into the CDSC Dataverse''', you will need to click the Support button on the top each page and write a message asking them to move it for you. They usually do this very quickly.
| |
|
| |
|
| ==An open science workflow using dataverse== | | ==An open science workflow using dataverse== |
Line 34: |
Line 12: |
|
| |
|
| ===Step 1: Anonymous while under review=== | | ===Step 1: Anonymous while under review=== |
| Some publications ask for an anonymized release of code and data. This is easy to do without breaking double-blind anonymity. Generate a code and data package that doesn't include information that will identify you, and then when uploading '''do not fill out metadata fields with authorship information''' and '''do not release (publish) your archive'''. Delete places where it autofills your name. Once your files are uploaded, under 'Edit Dataset', there's an option to 'Generate Private URL'. See details in the [https://guides.dataverse.org/en/6.0/user/dataset-management.html#private-url-to-review-unpublished-dataset|dataverse user guide]. You'll see that this creates a blue box at the top of your archive which reads "Unpublished Dataset Private URL – Privately share this dataset before it is published:" -- that's the link to share with your reviewers (test this link with another browser to be sure that it doesn't reveal anything). | | Some publications ask for an anonymized release of code and data. This is easy to do without breaking double-blind anonymity. Generate a code and data package that doesn't include information that will identify you, and then when uploading **do not fill out metadata fields with authorship information** and **do not release your archive**. Once your files are uploaded, under 'Edit Dataset', there's an option to 'Generate Private URL'. See details in the [[https://guides.dataverse.org/en/6.0/user/dataset-management.html#private-url-to-review-unpublished-dataset|dataverse user guide]] You'll see that this creates a blue box at the top of your archive which reads "Unpublished Dataset Private URL – Privately share this dataset before it is published:" -- that's the link to share with your reviewers (test this link with another browser to be sure that it doesn't reveal anything). |
|
| |
|
| ===Step 2: Identified after acceptance=== | | ===Step 2: Identified after acceptance=== |