CommunityData:Dataverse: Difference between revisions

Revision as of 07:06, 30 December 2023

How should I use the CDSC dataverse?

Create an account; you'll likely use your institutional login for it
Ask to join the CDSC group
Create your replication package or dataset release
Upload and fill out metadata fields (minimally, include a README.txt file to explain how to use your data and code)
Release!

An open science workflow using dataverse

There are many ways to follow open science practices. One way to fit the CDSC dataverse into your open science workflow is as follows:

Step 1: Anonymous while under review

Some publications ask for an anonymized release of code and data. This is easy to do without breaking double-blind anonymity. Generate a code and data package that doesn't include information that will identify you, and then when uploading do not fill out metadata fields with authorship information and do not release (publish) your archive. Delete places where it autofills your name. Once your files are uploaded, under 'Edit Dataset', there's an option to 'Generate Private URL'. See details in the user guide. You'll see that this creates a blue box at the top of your archive which reads "Unpublished Dataset Private URL – Privately share this dataset before it is published:" -- that's the link to share with your reviewers (test this link with another browser to be sure that it doesn't reveal anything).

Step 2: Identified after acceptance

You might like to include a link to your dataverse in your paper; you might also want to add it to your accepted preprint before uploading the paper into arXiv. Fill out as many metadata fields as you find useful (authors, description, subject, keywords), ask a colleague to take a look at your archive, and then release it.

Step 3: Updated after publication

After your paper is published and the DOI goes live, why not add this information into your archive so that others can find it (the 'Related Publication' metadata field)?

Potential questions and problems

Oh no, I made an error in my archive!

After an archive is released, you can make updates. But if you've realized that the previous version is sufficiently bad that you don't want it to be findable, the archive needs to be deleted or 'deaccessioned'.

What's this message about my data format and 'tabular ingest failed'?

Dataverse wants to be able to present your data in tabular form for people to view live without downloading, and is having trouble parsing what you uploaded. You can reformat, or you can ignore this error.

My replication package has a main directory and subdirectories -- how do I represent this?

Dataverse assumes everything is in the root. If you have subdirectories, the way to make this work is to upload the files from those subdirectories and then specify the file path using the UI that only shows up after you do the upload.