Python API

An API to facilitate use of the CZI Science CELLxGENE Census. The Census is a versioned container of single-cell data hosted at CELLxGENE Discover.

The API is built on the tiledbsoma SOMA API, and provides a number of helper functions including:

  • Open a named version of the Census, for use with the SOMA API

  • Get a list of available Census versions, and for each version, a description

  • Get a slice of the Census as an AnnData, for use with ScanPy

  • Get the URI for, or directly download, underlying data in H5AD format

For more information on the API, visit the cellxgene_census repo. For more information on SOMA, see the tiledbsoma repo.

Open/retrieve Cell Census data


Open the Census by version or URI.


Return a tiledbsoma.SOMATileDBContext with sensible defaults that can be further customized by the user.


Open the named version of the census, and return the URI for the dataset_id.


Download the source H5AD dataset, for the given dataset_id, to the user-specified file name.

Get slice as AnnData


Convenience wrapper around tiledbsoma.Experiment query, to build and execute a query, and return it as an anndata.AnnData object.

Feature presence matrix


Read the feature dataset presence matrix and return as a scipy.sparse.csr_array.

Versioning of Cell Census builds


Get release description for given Census version, from the Census release directory.


Get the directory of Census versions currently available, optionally filtering by specified flags.

Experimental: Machine Learning

Factory method for

An torchdata.datapipes.iter.IterDataPipe that reads obs and X data from a tiledbsoma.Experiment, based upon the specified queries along the obs and var axes.

Statistics about the data retrieved by ExperimentDataPipe via SOMA API.

Abstract base class for methods to process CELLxGENE Census ExperimentAxisQuery results into a Hugging Face Dataset in which each item represents one cell.

Generate a Hugging Face Dataset containing Geneformer token sequences for each cell in CELLxGENE Census ExperimentAxisQuery results (human).

Experimental: Processing


Convience wrapper around tiledbsoma.Experiment query and cellxgene_census.experimental.pp.highly_variable_genes() function, to build and execute a query, and annotate the query result genes (var dataframe) based upon variability.


Identify and annotate highly variable genes contained in the query results.


Calculate mean and/or variance along the obs axis from query results.

Experimental: Embeddings


Read cell (obs) embeddings and return as a dense numpy.ndarray.


Read embedding metadata and return as a Python dict.


Return metadata for a specific embedding.


Return a dictionary of all available embeddings for a given Census version.


Get a list of all census versions that contain a specific embedding.