πŸš€ Now in testing: Spatial! From Jan 16th, latest builds will include data from Slide-seq and Visium assays. ⚠️ Opening these builds requires tiledbsoma>=1.15.3 ⚠️. Learn more!

πŸš€ New to the Census: Train PyTorch models directly with Census data with our efficient and easy-to-use PyTorch loaders. Learn more!

πŸ’» Explore benchmarks of Census models and embeddings. See the report!

CZ CELLxGENE Discover Census

The Census provides efficient computational tooling to access, query, and analyze all single-cell RNA data from CZ CELLxGENE Discover. Using a new access paradigm of cell-based slicing and querying, you can interact with the data through TileDB-SOMA, or get slices in AnnData, Seurat, or SingleCellExperiment objects, thus accelerating your research by significantly minimizing data harmonization.

Get started:

image

Citing Census

To cite the project please follow the citation guidelines offered by CZ CELLxGENE Discover.

To cite individual studies please refer to the tutorial Generating citations for Census slices.

Census Capabilities

The Census is a data object publicly hosted online and an API to open it. The object is built using the SOMA API specification and data model, and it is implemented via TileDB-SOMA. As such, the Census has all the data capabilities offered by TileDB-SOMA including:

Data access at scale:

  • Cloud-based data access.

  • Efficient access for larger-than-memory slices of data.

  • Query and access data based on cell or gene metadata at low latency.

Interoperability with existing single-cell toolkits:

Interoperability with existing Python or R data structures:

  • From Python create PyArrow objects, SciPy sparse matrices, NumPy arrays, and pandas data frames.

  • From R create R Arrow objects, sparse matrices (via the Matrix package), and standard data frames and (dense) matrices.

Census Data and Schema

A description of the Census data and its schema is detailed here.

⚠️ Note that the data includes:

  • Full-gene sequencing read counts (e.g. Smart-Seq2) and molecule counts (e.g. 10X).

  • Duplicate cells present across multiple datasets, these can be filtered in or out using the cell metadata variable is_primary_data.

Census Data Releases

The Census data release plans are detailed here.

Starting May 15th, 2023, Census data releases with long-term support will be published every six months. These releases will be publicly accessible for at least five years. In addition, weekly releases may be published without any guarantee of permanence.

Questions, Feedback and Issues

  • Users are encouraged to submit questions and feature requests about the Census via github issues.

  • For quick support, you can join the CZI Science Community on Slack (czi.co/science-slack) and ask questions in the #cellxgene-census-users channel.

  • Users are encouraged to share their feedback by emailing soma@chanzuckerberg.com.

  • Bugs can be submitted via github issues.

  • If you believe you have found a security issue, please disclose it by contacting security@chanzuckerberg.com.

  • Additional FAQs can be found here.

Coming Soon!

  • We are currently working on creating the tooling necessary to perform data modeling at scale with seamless integration of the Census and PyTorch.

  • To increase the usability of the Census for research, in 2023 and 2024 we are planning to explore the following areas:

    • Include organism-wide normalized layers.

    • Include organism-wide embeddings.

    • On-demand information-rich subsampling.

Projects and Tools Using Census

If you are interested in listing a project here, please reach out to us at soma@chanzuckerberg.com