Python API

An API to facilitate use of the CZI Science CELLxGENE Census. The Census is a versioned container of single-cell data hosted at CELLxGENE Discover.

The API is built on the tiledbsoma SOMA API, and provides a number of helper functions including:

  • Open a named version of the Census, for use with the SOMA API

  • Get a list of available Census versions, and for each version, a description

  • Get a slice of the Census as an AnnData, for use with ScanPy

  • Get the URI for, or directly download, underlying data in H5AD format

For more information on the API, visit the cellxgene_census repo. For more information on SOMA, see the tiledbsoma repo.

Open/retrieve Cell Census data

cellxgene_census.open_soma

Open the Census by version or URI.

cellxgene_census.get_default_soma_context

Return a tiledbsoma.SOMATileDBContext with sensible defaults that can be further customized by the user.

cellxgene_census.get_source_h5ad_uri

Open the named version of the census, and return the URI for the dataset_id.

cellxgene_census.download_source_h5ad

Download the source H5AD dataset, for the given dataset_id, to the user-specified file name.

Get slice as AnnData

cellxgene_census.get_anndata

Convenience wrapper around tiledbsoma.Experiment query, to build and execute a query, and return it as an anndata.AnnData object.

Feature presence matrix

cellxgene_census.get_presence_matrix

Read the feature dataset presence matrix and return as a scipy.sparse.csr_array.

Versioning of Cell Census builds

cellxgene_census.get_census_version_description

Get release description for given Census version, from the Census release directory.

cellxgene_census.get_census_version_directory

Get the directory of Census versions currently available, optionally filtering by specified flags.

Experimental: Machine Learning

cellxgene_census.experimental.ml.pytorch.experiment_dataloader

Factory method for torch.utils.data.DataLoader.

cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe

An torchdata.datapipes.iter.IterDataPipe that reads obs and X data from a tiledbsoma.Experiment, based upon the specified queries along the obs and var axes.

cellxgene_census.experimental.ml.pytorch.Stats

Statistics about the data retrieved by ExperimentDataPipe via SOMA API.

cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder

Abstract base class for methods to process CELLxGENE Census ExperimentAxisQuery results into a Hugging Face Dataset in which each item represents one cell.

cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer

Generate a Hugging Face Dataset containing Geneformer token sequences for each cell in CELLxGENE Census ExperimentAxisQuery results (human).

Experimental: Processing

cellxgene_census.experimental.pp.get_highly_variable_genes

Convience wrapper around tiledbsoma.Experiment query and cellxgene_census.experimental.pp.highly_variable_genes() function, to build and execute a query, and annotate the query result genes (var dataframe) based upon variability.

cellxgene_census.experimental.pp.highly_variable_genes

Identify and annotate highly variable genes contained in the query results.

cellxgene_census.experimental.pp.mean_variance

Calculate mean and/or variance along the obs axis from query results.

Experimental: Embeddings

cellxgene_census.experimental.get_embedding

Read cell (obs) embeddings and return as a dense numpy.ndarray.

cellxgene_census.experimental.get_embedding_metadata

Read embedding metadata and return as a Python dict.

cellxgene_census.experimental.get_embedding_metadata_by_name

Return metadata for a specific embedding.

cellxgene_census.experimental.get_all_available_embeddings

Return a dictionary of all available embeddings for a given Census version.

cellxgene_census.experimental.get_all_census_versions_with_embedding

Get a list of all census versions that contain a specific embedding.