Python API

Open/retrieve Cell Census data

cellxgene_census.open_soma(*, census_version: str | None = 'stable', uri: str | None = None, context: SOMATileDBContext | None = None) Collection

Open the Census by version or URI.

Parameters:
  • census_version – The version of the Census, e.g. “latest”.

  • uri – The URI containing the Census SOMA objects. If specified, will take precedence over census_version parameter.

  • context – A custom SOMATileDBContext.

Returns:

A SOMA Collection object containing the top-level census. It can be used as a context manager, which will automatically close upon exit.

Raises:

ValueError – if the census cannot be found, the URI cannot be opened, or neither a URI or a version are specified.

Lifecycle

Experimental.

Examples

Open the default Census version, using a context manager which will automatically close the Census upon exit of the context.

>>> with cellxgene_census.open_soma() as census:
        ...

Open and close:

>>> census = cellxgene_census.open_soma()
    ...
    census.close()

Open a specific Census by version:

>>> with cellxgene_census.open_soma("2022-12-31") as census:
        ...

Open a Census by S3 URI, rather than by version.

>>> with cellxgene_census.open_soma(uri="s3://bucket/path") as census:
        ...

Open a Census by path (file:// URI), rather than by version.

>>> with cellxgene_census.open_soma(uri="/tmp/census") as census:
        ...
cellxgene_census.get_source_h5ad_uri(dataset_id: str, *, census_version: str = 'latest') CensusLocator

Open the named version of the census, and return the URI for the dataset_id. This does not guarantee that the H5AD exists or is accessible to the user.

Parameters:
  • dataset_id – The dataset_id of interest.

  • census_version – The census version.

Returns:

A CensusLocator object that contains the URI and optional S3 region for the source H5AD.

Raises:

KeyError – if either dataset_id or census_version do not exist.

Lifecycle

Experimental.

Examples

>>> cellxgene_census.get_source_h5ad_uri("cb5efdb0-f91c-4cbd-9ad4-9d4fa41c572d")
{'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/h5ads/cb5efdb0-f91c-4cbd-9ad4-9d4fa41c572d.h5ad',
's3_region': 'us-west-2'}
cellxgene_census.download_source_h5ad(dataset_id: str, to_path: str, *, census_version: str = 'latest') None

Download the source H5AD dataset, for the given dataset_id, to the user-specified file name.

Parameters:
  • dataset_id – Fetch the source (original) H5AD associated with this dataset_id.

  • to_path – The file name where the downloaded H5AD will be written. Must not already exist.

  • census_version – The census version name. Defaults to latest.

Raises:

ValueError – if the path already exists (i.e., will not overwrite an existing file), or is not a file.

Lifecycle

Experimental.

See also

get_source_h5ad_uri(): Look up the location of the source H5AD.

Examples

>>> download_source_h5ad("8e47ed12-c658-4252-b126-381df8d52a3d", to_path="/tmp/data.h5ad")

Get slice as AnnData

cellxgene_census.get_anndata(census: Collection, organism: str, measurement_name: str = 'RNA', X_name: str = 'raw', obs_value_filter: str | None = None, obs_coords: None | bytes | Slice[bytes] | Sequence[bytes] | float | Slice[float] | Sequence[float] | int | Slice[int] | Sequence[int] | slice | Slice[slice] | Sequence[slice] | str | Slice[str] | Sequence[str] | datetime64 | Slice[datetime64] | Sequence[datetime64] | TimestampType | Slice[TimestampType] | Sequence[TimestampType] | Array | ChunkedArray | ndarray[Any, dtype[integer]] | ndarray[Any, dtype[datetime64]] = None, var_value_filter: str | None = None, var_coords: None | bytes | Slice[bytes] | Sequence[bytes] | float | Slice[float] | Sequence[float] | int | Slice[int] | Sequence[int] | slice | Slice[slice] | Sequence[slice] | str | Slice[str] | Sequence[str] | datetime64 | Slice[datetime64] | Sequence[datetime64] | TimestampType | Slice[TimestampType] | Sequence[TimestampType] | Array | ChunkedArray | ndarray[Any, dtype[integer]] | ndarray[Any, dtype[datetime64]] = None, column_names: AxisColumnNames | None = None) AnnData

Convience wrapper around soma.Experiment query, to build and execute a query, and return it as an anndata.AnnData object.

Parameters:
  • census – The census object, usually returned by cellxgene_census.open_soma().

  • organism – The organism to query, usually one of Homo sapiens or Mus musculus.

  • measurement_name – The measurement object to query. Defaults to RNA.

  • X_name – The X layer to query. Defaults to raw.

  • obs_value_filter – Value filter for the obs metadata. Value is a filter query written in the SOMA value_filter syntax.

  • obs_coords – Coordinates for the obs axis, which is indexed by the soma_joinid value. May be an int, a list of int, or a slice. The default, None, selects all.

  • var_value_filter – Value filter for the var metadata. Value is a filter query written in the SOMA value_filter syntax.

  • var_coords – Coordinates for the var axis, which is indexed by the soma_joinid value. May be an int, a list of int, or a slice. The default, None, selects all.

  • column_names – Columns to fetch for obs and var dataframes.

Returns:

An anndata.AnnData object containing the census slice.

Lifecycle

Experimental.

Examples

>>> get_anndata(census, "Mus musculus", obs_value_filter="tissue_general in ['brain', 'lung']")
>>> get_anndata(census, "Homo sapiens", column_names={"obs": ["tissue"]})
>>> get_anndata(census, "Homo sapiens", obs_coords=slice(0, 1000))

Feature presence matrix

cellxgene_census.get_presence_matrix(census: Collection, organism: str, measurement_name: str = 'RNA') csr_matrix

Read the feature dataset presence matrix and return as a SciPy sparse CSR array. The returned sparse matrix is indexed on the first dimension by the dataset soma_joinid values, and on the second dimension by the var DataFrame soma_joinid values.

Parameters:
  • census – The census from which to read the presence matrix.

  • organism – The organism to query, usually one of Homo sapiens or Mus musculus.

  • measurement_name – The measurement object to query. Deafults to RNA.

Returns:

A scipy.sparse.csr_array object containing the presence matrix.

Raises:

ValueError – if the organism cannot be found.

Lifecycle

Experimental.

Examples

>>> get_presence_matrix(census, "Homo sapiens", "RNA")
<321x60554 sparse array of type '<class 'numpy.uint8'>'
with 6441269 stored elements in Compressed Sparse Row format>

Versioning of Cell Census builds

cellxgene_census.get_census_version_description(census_version: str) CensusVersionDescription

Get release description for given Census version, from the Census release directory.

Parameters:

census_version – The census version name.

Returns:

CensusVersionDescription - a dictionary containing a description of the release.

Raises:

KeyError – if unknown census_version value.

Lifecycle

Experimental.

See also

get_census_version_directory(): returns the entire directory as a dict.

Examples

>>> cellxgene_census.get_census_version_description("latest")
{'release_date': None,
'release_build': '2022-12-01',
'soma': {'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/soma/',
's3_region': 'us-west-2'},
'h5ads': {'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/h5ads/',
's3_region': 'us-west-2'}}
cellxgene_census.get_census_version_directory() Dict[str, CensusVersionDescription]

Get the directory of Census releases currently available.

Returns:

A dictionary that contains release names and their corresponding release description.

Lifecycle

Experimental.

See also

get_census_version_description(): get description by census_version.

Examples

>>> cellxgene_census.get_census_version_directory()
{'latest': {'release_date': None,
'release_build': '2022-12-01',
'soma': {'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/soma/',
's3_region': 'us-west-2'},
'h5ads': {'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/h5ads/',
's3_region': 'us-west-2'}},
'2022-12-01': {'release_date': None,
'release_build': '2022-12-01',
'soma': {'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/soma/',
's3_region': 'us-west-2'},
'h5ads': {'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/h5ads/',
's3_region': 'us-west-2'}},
'2022-11-29': {'release_date': None,
'release_build': '2022-11-29',
'soma': {'uri': 's3://cellxgene-data-public/cell-census/2022-11-29/soma/',
's3_region': 'us-west-2'},
'h5ads': {'uri': 's3://cellxgene-data-public/cell-census/2022-11-29/h5ads/',
's3_region': 'us-west-2'}}}