Open data
Everything you need to about sharing your research data openly
Why choose open data?
Open data plays a vital role in the research landscape by accelerating the pace of discovery, promoting data reuse, and enabling the testing and validation of research findings. Over the past decade, data has become a priority for academic stakeholders, including governments, funders, institutions, and publishers worldwide.
Today nearly all major scientific journals have an open data policy. However, according to the 2022 State of Open Data report, many academics find sharing data difficult.
On this page, we’ll provide information and guidance to help you get to grips with open data. We’ll also answer your open data questions, including:
- What is open data?
- What are the different types of research data?
- What are the benefits of sharing my research data openly?
- How do I share my research data?
- Why is it important to make my data FAIR?
DOWNLOAD OUR FREE eBook
Open data demystified: the essential toolkit for researchers
Fill in the form below for expert guidance on how to collect, store, format, share and publish your data.
What is open data?
Open data is data that is available for everyone to access, use, and share. For researchers, this refers to any datasets collected or created as part of your research project.
In some cases, data sharing is not appropriate for legal, ethical, data protection, or confidentiality reasons. F1000 recommends researchers strive to make their data as open as possible and as closed as necessary. This means you should only restrict access to data where essential, for example, for security or confidentiality reasons.
There are many types of research data, both quantitative and qualitative. This includes:
- Survey results
- Software
- Models and algorithms
- Interviews and transcripts
- Images, videos, and audio files
- Genome sequences
Benefits of open data
Sharing your data benefits your career, other researchers, and society. In recent years, open data has become a priority for academic stakeholders globally. So how can you benefit from open data?
Open data
Benefits for your career
Open data
Benefits for the community
Open data
Benefits for society
How to share your research data
Write a data management plan before your project begins
Planning for managing and sharing your data can go a long way in making it easy to open your data at the end of your project. Before research begins, create a detailed Data Management Plan (DMP). A DMP is a living document that describes how your research outputs will be generated, stored, used, and shared. The document can change and evolve throughout your research project. While most funders and publishers don’t require researchers to create a DMP, it can help to ensure efficient data management and makes it easier to make your data FAIR.
Prepare the data for sharing
You’ve collected your data; now it’s time to prepare it for sharing. While some restrictions may make it impossible to share your dataset, in other cases, you can share sensitive data provided you take the necessary precautions to protect the confidentiality of research participants. Once you’ve determined the extent to which you can share your data, you’ll need to format your data, label your files for sharing, and prepare any additional materials needed to understand and use the data. For example, you may include a data dictionary and details of any software needed to process the data. Different disciplines and data repositories may have different standards around formatting data, so research this before you get started.
Deposit your data in a repository
A repository is an online storage infrastructure for researchers to store data, code, and other research outputs. Depositing your data in a publicly accessible, recognized repository ensures that your dataset continues to be available to both humans and machines in a usable form. Uploading data to a repository helps preserve it more securely over time than hosting it on a website. Plus, you’ll receive a persistent identifier (PID) to establish ownership and enable others to cite the data. Your institutional librarian, funder, and colleagues can likely guide you in choosing a repository relevant to your discipline.
Apply an open license to the data
Apply an open license to your data to permit others to reuse it with minimal restrictions. Permitting reuse supports reproducibility and transparency in research and allows others to build on your findings. The Creative Commons Public Domain Dedication (CC0) and the Creative Commons Attribution Only (CC-BY) licenses are popular examples of open licenses. Both licenses allow reusers to distribute, remix, adapt and build upon the materials in any medium or format. The critical difference is that the CC0 license has no requirement for attribution, while the CC-BY license requires reusers to credit the original creator.
Make your data easy to find
Always cite your dataset in your published article and include a data availability statement. A data availability statement is a short section of text which tells the reader how, where, and under what conditions the data associated with your research can be accessed and reused. Once your research is published, some repositories allow you to add the article’s Digital Object Identifier (DOI) to the metadata of your dataset to establish a permanent link between these two outputs of your research. You can also choose to publish a Data Note to maximize the potential of your research data. Data Notes are a peer reviewed article type that indicates why and how your data was collected, analyzed, and validated.
What is FAIR data?
The FAIR guiding principles for scientific data management and stewardship were developed in 2016 to ensure research data is:
Findable
Data should be deposited in a repository, giving you a digital object identifier (DOI) or persistent identifier (PID). Use metadata to give a detailed description of your data.
Accessible
The repository must use a standard protocol like http://. The repository must continue to provide a landing page and the metadata even if the dataset were removed.
Interoperable
The metadata used to describe the data are based on the standard subject vocabularies and should be machine-readable. You can find the subject standards at FAIRsharing.org.
Reusable
The metadata which describes the data is accurate and relevant. An explicit data license has been applied to the data, explaining what other users can and cannot do.