The dos and don'ts of data sharing: passing our open data data checks

The dos and don’ts of data sharing: passing our open data data checks

8 mins


All F1000 open research publishing venues have a progressive open data sharing policy to ensure all published research adheres to the highest standards of research integrity. Our Editorial team performs open data checks on every submitted article to maintain these standards. In this blog post, our Editorial team reveals the dos and don’ts of preparing your data so you can enjoy a smooth publishing journey.

Open data at F1000

First things first, what is open data?

Open data is data that is available for everyone to access, use, and share. In a research context, the term refers to any information or materials that have been collected or created as part of your research project, such as interviews and transcripts, models and algorithms, genome sequences, images, videos, and audio files. Open data is a core part of open science together with other practices and principles, including open access, open source software and code, and open policies and mandates.

The F1000 open data policy

The F1000 open data policy and guidelines endorse the FAIR data principles (Findable, Accessible, Interoperable, Reusable) and are central to our publishing ethos. Plus, our policy is well aligned with the data sharing policies of the main institutions and funders. 

We aim to publish research that is transparent, trustworthy, and reproducible. We therefore ask our authors to make their scientific data publicly available by: 

  • Adding a data availability statement.  
  • Uploading data to a recognized, open online repository.  
  • Assigning all data, a persistent digital object identifier (DOI) or accession number (or alternative persistent identifier), and an open license.  
  • Describing the data with detailed metadata (e.g., rationale, methods of collection, and authors involved).  
  • Making the data as “open as possible” and as closed as necessary”. In other words, if the authors have the appropriate permissions, and it is ethical and safe to do so, data should be shared openly. However, if necessary, authors can limit or restrict access to the data.    

We encourage you to review the ‘Data Guidelines’ page of your chosen publishing venue to see our data policy in full.  

How does it work?

Upon submission, editors check the manuscript, the data availability statement, and whether data has been provided in an open format as part of our pre-publication checks. Some imperfections will not prevent a manuscript from being processed. In fact, authors can make touch-ups during the revision stage before the article is published. 

Dos and don’ts of data sharing

Do upload your data to a recognized online repository   

F1000 will not host data files or supplementary materials alongside articles or allow data to be shared as part of the manuscript, as these methods don’t align with best practices in open data sharing.     

Publishing data in a repository with a citable DOI allows the data to stand on its own as a research product. You can use the DOI to cite your data in your research and if other researchers reuse or refer to your data, you can receive attribution for producing the original datasets through data citations. Moreover, depositing data in a repository ensures data curation and storage in the long term, and makes it easier for researchers in the wider scientific community to find via search engines.   

We encourage you to use open community-recognized and discipline-specific repositories where they are available. Depending on the type of data involved in your research, there are different options available. For example, if you have social and economic data you might choose to use the UK Data Service while the Genome Sequence Archive (GSA) would be suitable if you are sharing genomics data. 

If there isn’t a discipline-specific repository for your data, you should use a generalist repository like Dryad, Figshare or Zenodo. You can also consider institutional, national, or public repositories. 

Do ensure that your article contains a data availability statement – even for restricted data  

A data availability statement is a required section of the manuscript describing all data underpinning the research and where it can be found. 

We recognize a few valid exemptions for not sharing data, software, or code including: 

  • Confidentiality issues, such as the need to protect participants’ privacy  
  • Ethical or security concerns with sharing the data
  • A third party owns the data.  
  • Cases where data is too large to be feasibly hosted by an F1000-approved repository
  • Situations where the third party proprietary software has been used and there is no open source alternative that can carry out specific functions in the same manner

In these cases, we still require a detailed statement describing the reasons behind the restrictions and instructions for readers to apply for access.

Do upload the correct file formats   

Data files should be uploaded in open, non-proprietary file formats, ensuring they are machine-readable and allowing readers and reviewers to open them without needing to purchase specialist software. For example, raw quantitative data should be presented in spreadsheets in csv or xlsx format. For text files, you should use a .TXT format which can be opened by any text editing software instead of Word documents as these data formats can impede reuse, replication, or reproducibility.   

Do make sure to apply a license to your data   

Before publishing your data, make sure it is assigned an open license. A license defines the terms under which your data can be reused and cited. All datasets associated with articles submitted to F1000 publishing venues must have either a Creative Commons Public Domain Dedication (CC0) or a Creative Commons Attribution Only (CC-BY) license permitting maximum reuse by others with minimal restrictions. For software and code, we strongly advise you to use OSI-approved licenses to distribute your work.

Do remember to add as much metadata as required to your data project when you upload it to a repository  

Metadata is a set of data that describes and gives information about other data. Metadata can include the purpose of your study, how, where, and when your data was collected, methodological details, or information about approval for data collection. It helps others understand your data, what hypotheses it set out to explore, and provides readers with the information they would need to reuse or replicate your data. For this reason, your data repository record should include as much detail as possible to provide sufficient context for your study.  

Don’t upload aggregated data  

We ask authors to share the raw data underlying their results in their repository project. Raw data includes the individual data points or the smallest unit of information the research is based upon. However, aggregated data such as averages and percentages can only be re-analyzed with limited methods. Moreover, outliers and missing data also can’t be seen from aggregated data, so the “whole story” can’t be accessed – like reading a cover summary instead of a whole novel.     

Don’t provide a “private” data link  

Data should be publicly available before the article is published. Sometimes authors include a link to a project still in “draft mode” or only visible using a privately shared link, but this means the data is neither accessible nor has a permanent record.     

Don’t submit an article without a data availability statement  

All articles must contain a data availability statement. Suppose your article presents new research and analyses. In that case, your data availability statement must include where the data can be accessed, including the persistent identifier and name of the repository. Plus, you should mention the title of your datasets and the license you have applied. If you have used data that is restricted for data protection reasons, you should clearly explain and identify the restrictions in your statement. 

Don’t submit an article with the wrong data availability statement   

The statement “No data is associated with this article” can only apply to specific article types where no data was analyzed to draw conclusions. For example, a Study Protocol might have this statement because this article type is typically the introduction and methods section of a standard Research Article, without having the results and discussion section included. Opinion Articles will also likely include this statement as this article type gives the authors’ own personal perspectives on a topical issue. On the other hand, it would be contradictory to add such a statement to article types that report original research, including Research Articles, Method Articles, or Data Notes.     

The F1000 Editorial team is here to help authors navigate making their research data openly available. We understand that sharing of data can feel like a daunting or complicated process, especially for researchers unfamiliar with the practice. 

This guidance will help you share your data confidently and simplify what our guidelines translate to in practice. If you have any questions or doubts, please don’t hesitate to contact the F1000 Editorial Team for further support and guidance.  

How much do you know about open data?

Discover open data, how it works, and why it’s important.