cellxgene_census.experimental.pp.get_highly_variable_genes

cellxgene_census.experimental.pp.get_highly_variable_genes(census: Collection, organism: str, measurement_name: str = 'RNA', X_name: str = 'raw', obs_value_filter: str | None = None, obs_coords: None | bytes | Slice[bytes] | Sequence[bytes] | float | Slice[float] | Sequence[float] | int | Slice[int] | Sequence[int] | slice | Slice[slice] | Sequence[slice] | str | Slice[str] | Sequence[str] | datetime64 | Slice[datetime64] | Sequence[datetime64] | TimestampType | Slice[TimestampType] | Sequence[TimestampType] | Array | ChunkedArray | ndarray[Any, dtype[integer]] | ndarray[Any, dtype[datetime64]] = None, var_value_filter: str | None = None, var_coords: None | bytes | Slice[bytes] | Sequence[bytes] | float | Slice[float] | Sequence[float] | int | Slice[int] | Sequence[int] | slice | Slice[slice] | Sequence[slice] | str | Slice[str] | Sequence[str] | datetime64 | Slice[datetime64] | Sequence[datetime64] | TimestampType | Slice[TimestampType] | Sequence[TimestampType] | Array | ChunkedArray | ndarray[Any, dtype[integer]] | ndarray[Any, dtype[datetime64]] = None, n_top_genes: int = 1000, flavor: Literal['seurat_v3'] = 'seurat_v3', span: float = 0.3, batch_key: str | Sequence[str] | None = None, max_loess_jitter: float = 1e-06, batch_key_func: Callable[[...], Any] | None = None) DataFrame

Convenience wrapper

Convience wrapper around soma.Experiment query and highly_variable_genes function, to build and

execute a query, and annotate the query result genes (var dataframe) based upon variability.

See highly_variable_genes for more information on this function.

Parameters:
  • census – The census object, usually returned by cellxgene_census.open_soma().

  • organism – The organism to query, usually one of Homo sapiens or Mus musculus.

  • measurement_name – The measurement object to query. Defaults to RNA.

  • X_name – The X layer to query. Defaults to raw.

  • obs_value_filter – Value filter for the obs metadata. Value is a filter query written in the SOMA value_filter syntax.

  • obs_coords – Coordinates for the obs axis, which is indexed by the soma_joinid value. May be an int, a list of int, or a slice. The default, None, selects all.

  • var_value_filter – Value filter for the var metadata. Value is a filter query written in the SOMA value_filter syntax.

  • var_coords – Coordinates for the var axis, which is indexed by the soma_joinid value. May be an int, a list of int, or a slice. The default, None, selects all.

  • n_top_genes – Number of genes to rank.

  • flavor – Method used to annotate genes. Must be seurat_v3

  • span – For seurat_v3 flavor, the fraction of obs/cells used to estimate the loess variance model fit.

  • batch_key – If specified, gene selection will be done by batch and combined. Specify the obs column name, or list of column names, identifying the batches. If not specified, all gene selection is done as a single batch.

  • max_lowess_jitter – The maximum jitter to add to data in case of LOESS failure (can occur when dataset has low entry counts.)

Returns:

Pandas DataFrame containing annotations for all var values specified by the query.

Raises:

ValueError – if the flavor paramater is not seurat_v3.

Examples

Fetch Pandas DataFrame containing var annotations for a subset of the cells matching the obs value_filter:

>>> hvg = get_highly_variable_genes(
        census,
        organism="Mus musculus",
        obs_value_filter="is_primary_data == True and tissue_general == 'lung'",
        n_top_genes = 500
    )

Fetch AnnData with top 500 genes:

>>> with cellxgene_census.open_soma(census_version="stable") as census:
        organism = "mus_musculus"
        obs_value_filter = "is_primary_data == True and tissue_general == 'lung'"

# Get the highly variable genes hvg = cellxgene_census.experimental.pp.get_highly_variable_genes(

census, organism=organism, obs_value_filter=obs_value_filter, n_top_genes = 500

)

# Fetch AnnData - all cells matching obs_value_filter, just the HVGs hvg_soma_ids = hvg[hvg.highly_variable].index.values adata = cellxgene_census.get_anndata(

census, organism=organism, obs_value_filter=obs_value_filter, var_coords=hvg_soma_ids

)

Lifecycle

experimental