🚀 Census LTS release 2025-11-08 includes CELLxGENE Discover datasets from Macaca mulatta, Callithrix jacchus, and Pan troglodytes as well as Homo sapiens and Mus musculus.

🚀 Now in testing: Spatial! Builds now include data from Slide-seq and Visium assays. ⚠️ Opening these builds requires tiledbsoma>=1.15.3 ⚠️. Learn more!

CZ CELLxGENE Discover Census

The Census provides efficient computational tooling to access, query, and analyze all single-cell RNA data from CZ CELLxGENE Discover. Using a new access paradigm of cell-based slicing and querying, you can interact with the data through TileDB-SOMA, or get slices in AnnData, Seurat, or SingleCellExperiment objects, thus accelerating your research by significantly minimizing data harmonization.

Get started:

Citing Census

To cite the project please follow the citation guidelines offered by CZ CELLxGENE Discover.

To cite individual studies please refer to the tutorial Generating citations for Census slices.

Census Capabilities

The Census is a data object publicly hosted online and an API to open it. The object is built using the SOMA API specification and data model, and it is implemented via TileDB-SOMA. As such, the Census has all the data capabilities offered by TileDB-SOMA including:

Data access at scale:

Cloud-based data access.
Efficient access for larger-than-memory slices of data.
Query and access data based on cell or gene metadata at low latency.

Interoperability with existing single-cell toolkits:

Load and create AnnData objects.
Load and create Seurat objects.
Load and create SingleCellExperiment objects.

Interoperability with existing Python or R data structures:

From Python create PyArrow objects, SciPy sparse matrices, NumPy arrays, and pandas data frames.
From R create R Arrow objects, sparse matrices (via the Matrix package), and standard data frames and (dense) matrices.

Census Data and Schema

A description of the Census data and its schema is detailed here.

⚠️ Note that the data includes:

Full-gene sequencing read counts (e.g. Smart-Seq2) and molecule counts (e.g. 10X).
Duplicate cells present across multiple datasets, these can be filtered in or out using the cell metadata variable is_primary_data.

Census Data Releases

The Census data release plans are detailed here.

Starting May 15th, 2023, Census data releases with long-term support will be published every six months. These releases will be publicly accessible for at least five years. In addition, weekly releases may be published without any guarantee of permanence.

Questions, Feedback and Issues

Users are encouraged to submit questions and feature requests about the Census via github issues.
For quick support, you can join the CZI Science Community on Slack (czi.co/science-slack) and ask questions in the #cellxgene-census-users channel.
Users are encouraged to share their feedback by emailing soma@chanzuckerberg.com.
Bugs can be submitted via github issues.
If you believe you have found a security issue, please disclose it by contacting security@chanzuckerberg.com.
Additional FAQs can be found here.

Projects and Tools Using Census

If you are interested in listing a project here, please reach out to us at soma@chanzuckerberg.com