Common questions about data repositories
What is a digital object identifier (DOI)?
When a researcher uploads a document to an online data repository, a digital object identifier (DOI) will be assigned. A DOI is a globally unique and persistent string that identifies your work permanently. A data repository can assign a DOI to any document. The DOI contains metadata that provides users with relevant information about an object, such as the title, author, keywords, year of publication, and the URL where that document is stored.
How do I find a ‘FAIR aligned’ repository?
The repository finder tool, developed by DataCite allows you to search for certified repositories that support the FAIR data principles. The FAIR data principles aim to make research data more Finable, Accessible, Interoperable, and Reusable (FAIR). Both FAIRsharing and Re3Data provide information on an array of criteria to help you identify the repositories most suited to your needs.
Should I use a discipline specific repository?
If your funder does not have a preferred repository of choice, you may wish to use a discipline-specific repository which is frequently used in your field of research. This type of repository will make it easy for your research community to find your data. There are many repositories of this type,including, GEO or GenBank for genetic data, or the UK Data Service for Social Sciences and Humanities data.
What is versioning?
Some repositories accommodate changes to deposited datasets through versioning. Selecting a repository that features versioning gives you the flexibility to add new data, restructure, and improve your dataset. Each version of your dataset is uniquely identifiable and maintained – meaning others can find, access, reuse, and cite whichever version of the dataset they require. What about my software and code? Software and code are important research outputs. In addition to using a version control system such as GitHub, you should deposit your source code in a data repository where it will be assigned a unique identifier. Using such a repository will ensure your code is openly and permanently available.
How do I share de-identified research data?
Repositories vary widely so it’s essential you choose the repository best suited to your research whether it be a subject specific, general, funder, or institutional repository. If you would like to share de-identified data then one option is the NICHD DASH. This repository allows researchers to store and access de-identified data from NICHD funded research for the purposes of secondary research use.
Can I share research data with restricted access?
Restricted data deposit is possible. If you need to preserve study participant anonymity in clinical datasets, then there are repositories suitable for datasets requiring restricted data access. We suggest contacting repositories directly to determine those with data access controls best suited to the specific requirements of your study.
Do I have to pay to deposit data to a repository?
Always check whether your repository requires a data publication fee. Not all repositories require data publication charges, and if your chosen repository does require a fee, you could still be entitled to sponsorship by a publisher or funder. Zenodo and Figshare both allow registered users to deposit data free of charge. However, Dryad charges a data publication fee.
What about my software and code?
Software and code are important research outputs. In addition to using a version control system such as GitHub, you should deposit your source code in a data repository where it will be assigned a unique identifier. Using such a repository will ensure your code is openly and permanently available.
Choosing a repository for your research data might seem difficult at first, but sharing your data openly is vital to increasing the reproducibility of research. In turn, you can expect greater visibility for your work and a wider potential impact.