π New to the Census: Train PyTorch models directly with Census data with our efficient and easy-to-use PyTorch loaders. Learn more!
π» Explore benchmarks of Census models and embeddings. See the report!
CZ CELLxGENE Discover Censusο
The Census provides efficient computational tooling to access, query, and analyze all single-cell RNA data from CZ CELLxGENE Discover. Using a new access paradigm of cell-based slicing and querying, you can interact with the data through TileDB-SOMA, or get slices in AnnData, Seurat, or SingleCellExperiment objects, thus accelerating your research by significantly minimizing data harmonization.
Get started:
Citing Censusο
To cite the project please follow the citation guidelines offered by CZ CELLxGENE Discover.
To cite individual studies please refer to the tutorial Generating citations for Census slices.
Census Capabilitiesο
The Census is a data object publicly hosted online and an API to open it. The object is built using the SOMA API specification and data model, and it is implemented via TileDB-SOMA. As such, the Census has all the data capabilities offered by TileDB-SOMA including:
Data access at scale:
Cloud-based data access.
Efficient access for larger-than-memory slices of data.
Query and access data based on cell or gene metadata at low latency.
Interoperability with existing single-cell toolkits:
Load and create AnnData objects.
Load and create Seurat objects.
Load and create SingleCellExperiment objects.
Interoperability with existing Python or R data structures:
Census Data and Schemaο
A description of the Census data and its schema is detailed here.
β οΈ Note that the data includes:
Full-gene sequencing read counts (e.g. Smart-Seq2) and molecule counts (e.g. 10X).
Duplicate cells present across multiple datasets, these can be filtered in or out using the cell metadata variable
is_primary_data
.
Census Data Releasesο
The Census data release plans are detailed here.
Starting May 15th, 2023, Census data releases with long-term support will be published every six months. These releases will be publicly accessible for at least five years. In addition, weekly releases may be published without any guarantee of permanence.
Questions, Feedback and Issuesο
Users are encouraged to submit questions and feature requests about the Census via github issues.
For quick support, you can join the CZI Science Community on Slack (czi.co/science-slack) and ask questions in the
#cellxgene-census-users
channel.Users are encouraged to share their feedback by emailing soma@chanzuckerberg.com.
Bugs can be submitted via github issues.
If you believe you have found a security issue, please disclose it by contacting security@chanzuckerberg.com.
Additional FAQs can be found here.
Coming Soon!ο
We are currently working on creating the tooling necessary to perform data modeling at scale with seamless integration of the Census and PyTorch.
To increase the usability of the Census for research, in 2023 and 2024 we are planning to explore the following areas:
Include organism-wide normalized layers.
Include organism-wide embeddings.
On-demand information-rich subsampling.
Projects and Tools Using Censusο
If you are interested in listing a project here, please reach out to us at soma@chanzuckerberg.com