Python API¶
Open/retrieve Cell Census data¶
- cellxgene_census.open_soma(*, census_version: str | None = 'stable', uri: str | None = None, context: SOMATileDBContext | None = None) Collection ¶
Open the Census by version or URI.
- Parameters:
census_version – The version of the Census, e.g. “latest”.
uri – The URI containing the Census SOMA objects. If specified, will take precedence over
census_version
parameter.context – A custom
SOMATileDBContext
.
- Returns:
A SOMA Collection object containing the top-level census. It can be used as a context manager, which will automatically close upon exit.
- Raises:
ValueError – if the census cannot be found, the URI cannot be opened, or neither a URI or a version are specified.
Lifecycle
Experimental.
Examples
Open the default Census version, using a context manager which will automatically close the Census upon exit of the context.
>>> with cellxgene_census.open_soma() as census: ...
Open and close:
>>> census = cellxgene_census.open_soma() ... census.close()
Open a specific Census by version:
>>> with cellxgene_census.open_soma("2022-12-31") as census: ...
Open a Census by S3 URI, rather than by version.
>>> with cellxgene_census.open_soma(uri="s3://bucket/path") as census: ...
Open a Census by path (file:// URI), rather than by version.
>>> with cellxgene_census.open_soma(uri="/tmp/census") as census: ...
- cellxgene_census.get_source_h5ad_uri(dataset_id: str, *, census_version: str = 'latest') CensusLocator ¶
Open the named version of the census, and return the URI for the
dataset_id
. This does not guarantee that the H5AD exists or is accessible to the user.- Parameters:
dataset_id – The
dataset_id
of interest.census_version – The census version.
- Returns:
A
CensusLocator
object that contains the URI and optional S3 region for the source H5AD.- Raises:
KeyError – if either dataset_id or census_version do not exist.
Lifecycle
Experimental.
Examples
>>> cellxgene_census.get_source_h5ad_uri("cb5efdb0-f91c-4cbd-9ad4-9d4fa41c572d") {'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/h5ads/cb5efdb0-f91c-4cbd-9ad4-9d4fa41c572d.h5ad', 's3_region': 'us-west-2'}
- cellxgene_census.download_source_h5ad(dataset_id: str, to_path: str, *, census_version: str = 'latest') None ¶
Download the source H5AD dataset, for the given dataset_id, to the user-specified file name.
- Parameters:
dataset_id – Fetch the source (original) H5AD associated with this dataset_id.
to_path – The file name where the downloaded H5AD will be written. Must not already exist.
census_version – The census version name. Defaults to latest.
- Raises:
ValueError – if the path already exists (i.e., will not overwrite an existing file), or is not a file.
Lifecycle
Experimental.
See also
get_source_h5ad_uri()
: Look up the location of the source H5AD.Examples
>>> download_source_h5ad("8e47ed12-c658-4252-b126-381df8d52a3d", to_path="/tmp/data.h5ad")
Get slice as AnnData¶
- cellxgene_census.get_anndata(census: Collection, organism: str, measurement_name: str = 'RNA', X_name: str = 'raw', obs_value_filter: str | None = None, obs_coords: None | bytes | Slice[bytes] | Sequence[bytes] | float | Slice[float] | Sequence[float] | int | Slice[int] | Sequence[int] | slice | Slice[slice] | Sequence[slice] | str | Slice[str] | Sequence[str] | datetime64 | Slice[datetime64] | Sequence[datetime64] | TimestampType | Slice[TimestampType] | Sequence[TimestampType] | Array | ChunkedArray | ndarray[Any, dtype[integer]] | ndarray[Any, dtype[datetime64]] = None, var_value_filter: str | None = None, var_coords: None | bytes | Slice[bytes] | Sequence[bytes] | float | Slice[float] | Sequence[float] | int | Slice[int] | Sequence[int] | slice | Slice[slice] | Sequence[slice] | str | Slice[str] | Sequence[str] | datetime64 | Slice[datetime64] | Sequence[datetime64] | TimestampType | Slice[TimestampType] | Sequence[TimestampType] | Array | ChunkedArray | ndarray[Any, dtype[integer]] | ndarray[Any, dtype[datetime64]] = None, column_names: AxisColumnNames | None = None) AnnData ¶
Convience wrapper around
soma.Experiment
query, to build and execute a query, and return it as ananndata.AnnData
object.- Parameters:
census – The census object, usually returned by
cellxgene_census.open_soma()
.organism – The organism to query, usually one of Homo sapiens or Mus musculus.
measurement_name – The measurement object to query. Defaults to RNA.
X_name – The
X
layer to query. Defaults to raw.obs_value_filter – Value filter for the
obs
metadata. Value is a filter query written in the SOMAvalue_filter
syntax.obs_coords – Coordinates for the
obs
axis, which is indexed by thesoma_joinid
value. May be anint
, a list ofint
, or a slice. The default,None
, selects all.var_value_filter – Value filter for the
var
metadata. Value is a filter query written in the SOMAvalue_filter
syntax.var_coords – Coordinates for the
var
axis, which is indexed by thesoma_joinid
value. May be anint
, a list ofint
, or a slice. The default,None
, selects all.column_names – Columns to fetch for
obs
andvar
dataframes.
- Returns:
An
anndata.AnnData
object containing the census slice.
Lifecycle
Experimental.
Examples
>>> get_anndata(census, "Mus musculus", obs_value_filter="tissue_general in ['brain', 'lung']")
>>> get_anndata(census, "Homo sapiens", column_names={"obs": ["tissue"]})
>>> get_anndata(census, "Homo sapiens", obs_coords=slice(0, 1000))
Feature presence matrix¶
- cellxgene_census.get_presence_matrix(census: Collection, organism: str, measurement_name: str = 'RNA') csr_matrix ¶
Read the feature dataset presence matrix and return as a SciPy sparse CSR array. The returned sparse matrix is indexed on the first dimension by the dataset
soma_joinid
values, and on the second dimension by thevar
DataFramesoma_joinid
values.- Parameters:
census – The census from which to read the presence matrix.
organism – The organism to query, usually one of Homo sapiens or Mus musculus.
measurement_name – The measurement object to query. Deafults to RNA.
- Returns:
A
scipy.sparse.csr_array
object containing the presence matrix.- Raises:
ValueError – if the organism cannot be found.
Lifecycle
Experimental.
Examples
>>> get_presence_matrix(census, "Homo sapiens", "RNA") <321x60554 sparse array of type '<class 'numpy.uint8'>' with 6441269 stored elements in Compressed Sparse Row format>
Versioning of Cell Census builds¶
- cellxgene_census.get_census_version_description(census_version: str) CensusVersionDescription ¶
Get release description for given Census version, from the Census release directory.
- Parameters:
census_version – The census version name.
- Returns:
CensusVersionDescription
- a dictionary containing a description of the release.- Raises:
KeyError – if unknown census_version value.
Lifecycle
Experimental.
See also
get_census_version_directory()
: returns the entire directory as a dict.Examples
>>> cellxgene_census.get_census_version_description("latest") {'release_date': None, 'release_build': '2022-12-01', 'soma': {'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/soma/', 's3_region': 'us-west-2'}, 'h5ads': {'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/h5ads/', 's3_region': 'us-west-2'}}
- cellxgene_census.get_census_version_directory() Dict[str, CensusVersionDescription] ¶
Get the directory of Census releases currently available.
- Returns:
A dictionary that contains release names and their corresponding release description.
Lifecycle
Experimental.
See also
get_census_version_description()
: get description by census_version.Examples
>>> cellxgene_census.get_census_version_directory() {'latest': {'release_date': None, 'release_build': '2022-12-01', 'soma': {'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/soma/', 's3_region': 'us-west-2'}, 'h5ads': {'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/h5ads/', 's3_region': 'us-west-2'}}, '2022-12-01': {'release_date': None, 'release_build': '2022-12-01', 'soma': {'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/soma/', 's3_region': 'us-west-2'}, 'h5ads': {'uri': 's3://cellxgene-data-public/cell-census/2022-12-01/h5ads/', 's3_region': 'us-west-2'}}, '2022-11-29': {'release_date': None, 'release_build': '2022-11-29', 'soma': {'uri': 's3://cellxgene-data-public/cell-census/2022-11-29/soma/', 's3_region': 'us-west-2'}, 'h5ads': {'uri': 's3://cellxgene-data-public/cell-census/2022-11-29/h5ads/', 's3_region': 'us-west-2'}}}