cellxgene_census.experimental.pp.get_highly_variable_genes
- cellxgene_census.experimental.pp.get_highly_variable_genes(census: Collection, organism: str, measurement_name: str = 'RNA', X_name: str = 'raw', obs_value_filter: str | None = None, obs_coords: None | bytes | Slice[bytes] | Sequence[bytes] | float | Slice[float] | Sequence[float] | int | Slice[int] | Sequence[int] | slice | Slice[slice] | Sequence[slice] | str | Slice[str] | Sequence[str] | datetime64 | Slice[datetime64] | Sequence[datetime64] | TimestampType | Slice[TimestampType] | Sequence[TimestampType] | Array | ChunkedArray | ndarray[Any, dtype[integer]] | ndarray[Any, dtype[datetime64]] = None, var_value_filter: str | None = None, var_coords: None | bytes | Slice[bytes] | Sequence[bytes] | float | Slice[float] | Sequence[float] | int | Slice[int] | Sequence[int] | slice | Slice[slice] | Sequence[slice] | str | Slice[str] | Sequence[str] | datetime64 | Slice[datetime64] | Sequence[datetime64] | TimestampType | Slice[TimestampType] | Sequence[TimestampType] | Array | ChunkedArray | ndarray[Any, dtype[integer]] | ndarray[Any, dtype[datetime64]] = None, n_top_genes: int = 1000, flavor: Literal['seurat_v3'] = 'seurat_v3', span: float = 0.3, batch_key: str | Sequence[str] | None = None, max_loess_jitter: float = 1e-06, batch_key_func: Callable[[...], Any] | None = None) DataFrame
Convience wrapper around
tiledbsoma.Experiment
query andcellxgene_census.experimental.pp.highly_variable_genes()
function, to build and execute a query, and annotate the query result genes (var
dataframe) based upon variability.- Parameters:
census – The Census object, usually returned by
open_soma()
.organism – The organism to query, usually one of
"Homo sapiens"
or"Mus musculus"
.measurement_name – The measurement object to query. Defaults to
"RNA"
.X_name – The
X
layer to query. Defaults to"raw"
.obs_value_filter – Value filter for the
obs
metadata. Value is a filter query written in the SOMAvalue_filter
syntax.obs_coords – Coordinates for the
obs
axis, which is indexed by thesoma_joinid
value. May be anint
, a list ofint
, or a slice. The default,None
, selects all.var_value_filter – Value filter for the
var
metadata. Value is a filter query written in the SOMAvalue_filter
syntax.var_coords – Coordinates for the
var
axis, which is indexed by thesoma_joinid
value. May be anint
, a list ofint
, or a slice. The default,None
, selects all.n_top_genes – Number of genes to rank.
flavor – Method used to annotate genes. Must be
"seurat_v3"
.span – If
flavor="seurat_v3"
, the fraction of obs/cells used to estimate the LOESS variance model fit.batch_key – If specified, gene selection will be done by batch and combined. Specify the obs column name, or list of column names, identifying the batches. If not specified, all gene selection is done as a single batch. If multiple batch keys are specified, and no batch_key_func is specified, the batch key will be generated by converting values to string and concatenating them.
max_lowess_jitter – The maximum jitter to add to data in case of LOESS failure (can occur when dataset has low entry counts.)
batch_key_func – Optional function to create a user-defined batch key. Function will be called once per row in the obs dataframe. Function will receive a single argument: a
pandas.Series
containing values specified in thebatch_key
argument.
- Returns:
pandas.DataFrame
containing annotations for allvar
values specified by the query.- Raises:
ValueError – if the flavor paramater is not
"seurat_v3"
.
Examples
Fetch a
pandas.DataFrame
containing var annotations for a subset of the cells matching theobs_value_filter
:>>> hvg = get_highly_variable_genes( census, organism="Mus musculus", obs_value_filter="is_primary_data == True and tissue_general == 'lung'", n_top_genes = 500 )
Fetch an
anndata.AnnData
with top 500 genes:>>> with cellxgene_census.open_soma(census_version="stable") as census: organism = "mus_musculus" obs_value_filter = "is_primary_data == True and tissue_general == 'lung'" # Get the highly variable genes hvg = cellxgene_census.experimental.pp.get_highly_variable_genes( census, organism=organism, obs_value_filter=obs_value_filter, n_top_genes = 500 ) # Fetch AnnData - all cells matching obs_value_filter, just the HVGs hvg_soma_ids = hvg[hvg.highly_variable].index.values adata = cellxgene_census.get_anndata( census, organism=organism, obs_value_filter=obs_value_filter, var_coords=hvg_soma_ids )
Lifecycle
experimental