open science for biomedical research
View the Project on GitHub chanzuckerberg/open-science
Topics in this section:
This section provides guidance on improving data management to ease the process of sharing and maximize your research impact, and also considers options for where to publish your data (e.g., selecting repositories).
Curating and documenting data sufficiently for other researchers to use takes time and effort, and data management competes for attention with other high-priority research activities like generating data in the next experiment.
Given this pressure, the most straightforward answer to “Why share research data?” is that a variety of stakeholders require it, including:
For example, the new NIH Policy on Data Management and Sharing comes into effect on January 25, 2023, and many projects are already adjusting their data management practices to accommodate these expectations.
Data sharing is becoming an increasingly frequent expectation because of the need for data to ensure reproducibility of published scientific research. In addition to improved robustness of scientific findings, data sharing can also benefit individual researchers (see Data sharing and how it can benefit your scientific career) by encouraging collaboration and reuse of existing data, both of which can accelerate the rate of scientific progress.
The content in this topic is framed around the idea of making data as FAIR as possible:
For more information on these guiding ideas, see The FAIR Guiding Principles for scientific data management and stewardship, an article describing the principles with examples.
This topic provides general guidance in best practices for organizing and documenting data, both within a single dataset and across data for an entire project. Good practices in data management and documentation are essential for efficient and timely data sharing, as well as improving workflows across the entire data life cycle.
The DataONE Best Practices Primer, from which the image above is derived, describes guidelines for managing data throughout the life cycle of a project. Additional resources related to overall data management include:
A great deal of scientific data is tabular in nature, with data organized into rows and columns, While spreadsheets make it easier for us to view and work with data, some common spreadsheet practices make it difficult to interpret and reuse the data later. These resources identify ways to improve the entry and organization of data in this format.
Most research projects include multiple data files, representing different data types and structures. These data are subsequently filtered and manipulated during analysis, resulting in an even larger number of diverse files. The following resources provide a breadth of information for considering data management:
These resources focus on projects including a computational computational component, and include approaches to support automation of data management:
Metadata refers to information and details about data. For more information about different types of metadata and it can be used in contexts ranging from social media to museums, please see Understanding Metadata: What is Metadata, and What is it For?: A Primer. The resources below will help you think about metadata from the context of scientific research, which focuses on providing information about the way data was collected and analyzed so that other researchers can understand and reuse it. Remember, though, that documenting your data is as important for yourself (and your collaborators) as for other scientists who may be interested in using it, and that documentation should occur throughout the research process– not only when it comes time to publish.
One of the most common tools for recording metadata is with a README, which is an extra file/document associated with data that describes important information about the dataset and how it was created. The following resources provide general information about READMEs, as well as additional guidance on how to document data in different contexts:
Identifying what data to publish and where to deposit the data can be a daunting task. This section identifies types of data repositories and examples of repositories common in biomedical research.
There are a variety of different types of data repositories appropriate for depositing and/or archiving biomedical data:
For a more comprehensive exploration of biomedical data repositories, please see An overview of biomedical platforms for managing research data
The table below highlights data repositories that have been recommended by biomedical researchers for particular data types.
Repository | Data types |
---|---|
Cellxgene Data Portal | Single cell projects funded by the CZI Single Cell Program, submissions handled by Lattice |
Cell Image Library | image (still, video, z-stack, time series) |
Brain Image Library | image (brain) |
Image Data Resource | image (cell and tissue) |
Data Dryad | any (also see section below) |
Dryad is a repository that accepts research data of any type and format. Please view the Dryad topic page for more information and examples of data submissions.
We publish data so that it will be acccessible to other researchers in the future. The following articles provide some context for associated issues with data reuse, including data ethics and privacy/security: