How to choose a suitable data repository for your research data

June 9, 2023 7 mins

F1000

Governments, funders, and institutions worldwide are increasingly introducing open data policies and mandates to encourage researchers to share their research data openly. Depositing your data in a publicly accessible research data repository that assigns a persistent identifier (PI or PID) ensures that your dataset remains available to humans and machines in the future. National institutes, funders, and journals often maintain a list of endorsed repositories for your use. You may need to set out your intention to deposit your research data in a repository as part of a data management plan (DMP). Still, choosing the best repository from such lists can often be daunting. Here, we offer some preliminary guidance on selecting the most suitable repository for your research data.

Where to share your data?

You know you want to make your data openly available, but where should you host it? Some researchers opt to host their data solely on a laboratory website or as part of a publication’s supplementary. However, sharing data (or any other research outputs) in this ways hinders others from finding and reusing it. That’s where data repositories come in.

What is a data repository?

According to the Registry of Research Data Repositories (re3data.org) — a global registry of research data repositories — a repository is an online storage infrastructure for researchers to store data, code, and other research outputs for scholarly publication. Research data means information objects generated by scholarly projects for example, through experiments, measurements, surveys, or interviews. Depositing your data in a publicly accessible, recognized repository ensures that your dataset continues to be available to both humans and machines in a usable form.

An open access data repository openly stores data, including scientific data from research projects in a way that allows immediate user access to anyone. There are no limitations to the repository access. As such, repositories make data findable, accessible, and usable in the long term, by using sustainable file formats and providing persistent identifiers and informative descriptive data (metadata).

Choosing a data repository

Nowadays, it is widely considered best practice to deposit your data in a publicly available repository, where it is assigned a persistent identifier (PI or PID) and can be accessed by anyone, anywhere. Where you deposit your data will depend on any applicable legal and ethical factors, who funded the work, and where you hope to publish. However, there are a few simple questions you can ask yourself to make selecting an appropriate repository easier.

Question #1: Does your data contain personal or sensitive information that cannot be anonymized?

If you answered ‘yes’ to this question, consider a controlled access repository.

There may be cases where openly sharing data is not feasible due to ethical or confidentiality considerations. Depending on what the Institutional Review Board approving your study said about data sharing, and what your participants consented to, it may still be possible to make your data accessible to authenticated users via a controlled-access repository or a generalist repository that allows you to limit access to your data.

Some of the repositories that allow you to limit access to your data include:

Figshare – You can generate a ‘private sharing link’ for free. You can send this link via email address, and the recipient can access the data without logging in or having a Figshare account.
Zenodo – Funded by CERN, OpenAIRE, and Horizon 2020, Zenodo lets users deposit restricted files and share access with others if they meet certain requirements.
OSF – You can make your project private or public and alternate between the two settings.

If you answered ‘no’ to this question, move on to question #2.

Question #2: Is there a discipline-specific repository for your dataset?

If you answered ‘yes’ to this question, consider a discipline-specific repository.

Research data differs significantly across disciplines. Discipline-specific repositories offer specialist domain knowledge and curation expertise for particular data types. Plus, using a discipline-specific repository can also make your data more visible to others in your research community. We recommend speaking to your institutional librarian, funder, or colleagues for guidance on choosing a repository relevant to your discipline.

If you answered ‘no’ to this question, move on to question #3.

Question #3: Does your institutional repository accept data?

If you answered ‘yes’ to this question, consider your institutional repository.

Many institutions offer support providing repository infrastructure to their researchers for managing and depositing data. Institutional repositories that accept datasets provide stewardship, helping to ensure that your dataset is preserved and accessible.

If you answered ‘no’ to this question, consider a generalist data repository.

General data repositories accept datasets regardless of discipline or institution. These repositories support a wide variety of file types and are particularly useful where a discipline-specific repository does not exist.

Some examples of generalist data repositories include:

Common questions about data repositories

What is a digital object identifier (DOI)?

When a researcher uploads a document to an online data repository, a digital object identifier (DOI) will be assigned. A DOI is a globally unique and persistent string that identifies your work permanently. A data repository can assign a DOI to any document. The DOI contains metadata that provides users with relevant information about an object, such as the title, author, keywords, year of publication, and the URL where that document is stored.

How do I find a ‘FAIR aligned’ repository?

The repository finder tool, developed by DataCite allows you to search for certified repositories that support the FAIR data principles. The FAIR data principles aim to make research data more Finable, Accessible, Interoperable, and Reusable (FAIR). Both FAIRsharing and Re3Data provide information on an array of criteria to help you identify the repositories most suited to your needs.

Should I use a discipline specific repository?

If your funder does not have a preferred repository of choice, you may wish to use a discipline-specific repository which is frequently used in your field of research. This type of repository will make it easy for your research community to find your data. There are many repositories of this type,including, GEO or GenBank for genetic data, or the UK Data Service for Social Sciences and Humanities data.

What is versioning?

Some repositories accommodate changes to deposited datasets through versioning. Selecting a repository that features versioning gives you the flexibility to add new data, restructure, and improve your dataset. Each version of your dataset is uniquely identifiable and maintained – meaning others can find, access, reuse, and cite whichever version of the dataset they require. What about my software and code? Software and code are important research outputs. In addition to using a version control system such as GitHub, you should deposit your source code in a data repository where it will be assigned a unique identifier. Using such a repository will ensure your code is openly and permanently available.

How do I share de-identified research data?

Repositories vary widely so it’s essential you choose the repository best suited to your research whether it be a subject specific, general, funder, or institutional repository. If you would like to share de-identified data then one option is the NICHD DASH. This repository allows researchers to store and access de-identified data from NICHD funded research for the purposes of secondary research use.

Can I share research data with restricted access?

Restricted data deposit is possible. If you need to preserve study participant anonymity in clinical datasets, then there are repositories suitable for datasets requiring restricted data access. We suggest contacting repositories directly to determine those with data access controls best suited to the specific requirements of your study.

Do I have to pay to deposit data to a repository?

Always check whether your repository requires a data publication fee. Not all repositories require data publication charges, and if your chosen repository does require a fee, you could still be entitled to sponsorship by a publisher or funder. Zenodo and Figshare both allow registered users to deposit data free of charge. However, Dryad charges a data publication fee.

What about my software and code?

Software and code are important research outputs. In addition to using a version control system such as GitHub, you should deposit your source code in a data repository where it will be assigned a unique identifier. Using such a repository will ensure your code is openly and permanently available.

Choosing a repository for your research data might seem difficult at first, but sharing your data openly is vital to increasing the reproducibility of research. In turn, you can expect greater visibility for your work and a wider potential impact.

Discover everything you need to know about making your research data open and FAIR.

OPEN DATA ESSENTIALS