Dos and don’ts of data sharing
Do upload your data to a recognized online repository
F1000 will not host data files or supplementary materials alongside articles or allow data to be shared as part of the manuscript, as these methods don’t align with best practices in open data sharing.
Publishing data in a repository with a citable DOI allows the data to stand on its own as a research product. You can use the DOI to cite your data in your research and if other researchers reuse or refer to your data, you can receive attribution for producing the original datasets through data citations. Moreover, depositing data in a repository ensures data curation and storage in the long term, and makes it easier for researchers in the wider scientific community to find via search engines.
We encourage you to use open community-recognized and discipline-specific repositories where they are available. Depending on the type of data involved in your research, there are different options available. For example, if you have social and economic data you might choose to use the UK Data Service while the Genome Sequence Archive (GSA) would be suitable if you are sharing genomics data.
If there isn’t a discipline-specific repository for your data, you should use a generalist repository like Dryad, Figshare or Zenodo. You can also consider institutional, national, or public repositories.
Do ensure that your article contains a data availability statement – even for restricted data
A data availability statement is a required section of the manuscript describing all data underpinning the research and where it can be found.
We recognize a few valid exemptions for not sharing data, software, or code including:
- Confidentiality issues, such as the need to protect participants’ privacy
- Ethical or security concerns with sharing the data
- A third party owns the data.
- Cases where data is too large to be feasibly hosted by an F1000-approved repository
- Situations where the third party proprietary software has been used and there is no open source alternative that can carry out specific functions in the same manner
In these cases, we still require a detailed statement describing the reasons behind the restrictions and instructions for readers to apply for access.
Do upload the correct file formats
Data files should be uploaded in open, non-proprietary file formats, ensuring they are machine-readable and allowing readers and reviewers to open them without needing to purchase specialist software. For example, raw quantitative data should be presented in spreadsheets in csv or xlsx format. For text files, you should use a .TXT format which can be opened by any text editing software instead of Word documents as these data formats can impede reuse, replication, or reproducibility.
Do make sure to apply a license to your data
Before publishing your data, make sure it is assigned an open license. A license defines the terms under which your data can be reused and cited. All datasets associated with articles submitted to F1000 publishing venues must have either a Creative Commons Public Domain Dedication (CC0) or a Creative Commons Attribution Only (CC-BY) license permitting maximum reuse by others with minimal restrictions. For software and code, we strongly advise you to use OSI-approved licenses to distribute your work.
Do remember to add as much metadata as required to your data project when you upload it to a repository
Metadata is a set of data that describes and gives information about other data. Metadata can include the purpose of your study, how, where, and when your data was collected, methodological details, or information about approval for data collection. It helps others understand your data, what hypotheses it set out to explore, and provides readers with the information they would need to reuse or replicate your data. For this reason, your data repository record should include as much detail as possible to provide sufficient context for your study.
Don’t upload aggregated data
We ask authors to share the raw data underlying their results in their repository project. Raw data includes the individual data points or the smallest unit of information the research is based upon. However, aggregated data such as averages and percentages can only be re-analyzed with limited methods. Moreover, outliers and missing data also can’t be seen from aggregated data, so the “whole story” can’t be accessed – like reading a cover summary instead of a whole novel.
Don’t provide a “private” data link
Data should be publicly available before the article is published. Sometimes authors include a link to a project still in “draft mode” or only visible using a privately shared link, but this means the data is neither accessible nor has a permanent record.
Don’t submit an article without a data availability statement
All articles must contain a data availability statement. Suppose your article presents new research and analyses. In that case, your data availability statement must include where the data can be accessed, including the persistent identifier and name of the repository. Plus, you should mention the title of your datasets and the license you have applied. If you have used data that is restricted for data protection reasons, you should clearly explain and identify the restrictions in your statement.
Don’t submit an article with the wrong data availability statement
The statement “No data is associated with this article” can only apply to specific article types where no data was analyzed to draw conclusions. For example, a Study Protocol might have this statement because this article type is typically the introduction and methods section of a standard Research Article, without having the results and discussion section included. Opinion Articles will also likely include this statement as this article type gives the authors’ own personal perspectives on a topical issue. On the other hand, it would be contradictory to add such a statement to article types that report original research, including Research Articles, Method Articles, or Data Notes.
The F1000 Editorial team is here to help authors navigate making their research data openly available. We understand that sharing of data can feel like a daunting or complicated process, especially for researchers unfamiliar with the practice.
This guidance will help you share your data confidently and simplify what our guidelines translate to in practice. If you have any questions or doubts, please don’t hesitate to contact the F1000 Editorial Team for further support and guidance.