Python API

An API to facilitate use of the CZI Science CELLxGENE Census. The Census is a versioned container of single-cell data hosted at CELLxGENE Discover.

The API is built on the tiledbsoma SOMA API, and provides a number of helper functions including:

Open a named version of the Census, for use with the SOMA API

Get a list of available Census versions, and for each version, a description

Get a slice of the Census as an AnnData, for use with ScanPy

Get the URI for, or directly download, underlying data in H5AD format

For more information on the API, visit the cellxgene_census repo. For more information on SOMA, see the tiledbsoma repo.

Open/retrieve Cell Census data

`cellxgene_census.open_soma`	Open the Census by version or URI.
`cellxgene_census.get_default_soma_context`	Return a `tiledbsoma.SOMATileDBContext` with sensible defaults that can be further customized by the user.
`cellxgene_census.get_source_h5ad_uri`	Open the named version of the census, and return the URI for the `dataset_id`.
`cellxgene_census.download_source_h5ad`	Download the source H5AD dataset, for the given dataset_id, to the user-specified file name.

Get slice as AnnData

`cellxgene_census.get_anndata`	Convenience wrapper around `tiledbsoma.Experiment` query, to build and execute a query, and return it as an `anndata.AnnData` object.
`cellxgene_census.get_obs`	Get the observation metadata for a query on the census.
`cellxgene_census.get_var`	Get the variable metadata for a query on the census.

Feature presence matrix

cellxgene_census.get_presence_matrix

Read the feature dataset presence matrix and return as a scipy.sparse.csr_array.

Versioning of Cell Census builds

`cellxgene_census.get_census_version_description`	Get release description for given Census version, from the Census release directory.
`cellxgene_census.get_census_version_directory`	Get the directory of Census versions currently available, optionally filtering by specified flags.

Experimental: Machine Learning

`cellxgene_census.experimental.ml.pytorch.experiment_dataloader`	Factory method for `torch.utils.data.DataLoader`.
`cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe`	An `torchdata.datapipes.iter.IterDataPipe` that reads `obs` and `X` data from a `tiledbsoma.Experiment`, based upon the specified queries along the `obs` and `var` axes.
`cellxgene_census.experimental.ml.pytorch.Stats`	Statistics about the data retrieved by `ExperimentDataPipe` via SOMA API.
`cellxgene_census.experimental.ml.encoders.Encoder`	Base class for `obs` encoders.
`cellxgene_census.experimental.ml.encoders.LabelEncoder`	Default encoder based on `sklearn.preprocessing.LabelEncoder`.
`cellxgene_census.experimental.ml.encoders.BatchEncoder`	An encoder that concatenates and encodes several `obs` columns.
`cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder`	Abstract base class for methods to process CELLxGENE Census ExperimentAxisQuery results into a Hugging Face Dataset in which each item represents one cell.
`cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer`	Generate a Hugging Face Dataset containing Geneformer token sequences for each cell in CELLxGENE Census ExperimentAxisQuery results (human).

Experimental: Processing

`cellxgene_census.experimental.pp.get_highly_variable_genes`	Convience wrapper around `tiledbsoma.Experiment` query and `cellxgene_census.experimental.pp.highly_variable_genes()` function, to build and execute a query, and annotate the query result genes (`var` dataframe) based upon variability.
`cellxgene_census.experimental.pp.highly_variable_genes`	Identify and annotate highly variable genes contained in the query results.
`cellxgene_census.experimental.pp.mean_variance`	Calculate mean and/or variance along the `obs` axis from query results.

Experimental: Embeddings

`cellxgene_census.experimental.get_embedding`	Read cell (obs) embeddings and return as a dense `numpy.ndarray`.
`cellxgene_census.experimental.get_embedding_metadata`	Read embedding metadata and return as a Python dict.
`cellxgene_census.experimental.get_embedding_metadata_by_name`	Return metadata for a specific embedding.
`cellxgene_census.experimental.get_all_available_embeddings`	Return a dictionary of all available embeddings for a given Census version.
`cellxgene_census.experimental.get_all_census_versions_with_embedding`	Get a list of all census versions that contain a specific embedding.