cellxgene_census.experimental.pp.highly_variable_genes

cellxgene_census.experimental.pp.highly_variable_genes(query: ExperimentAxisQuery, n_top_genes: int = 1000, layer: str = 'raw', flavor: Literal['seurat_v3'] = 'seurat_v3', span: float = 0.3, batch_key: str | Sequence[str] | None = None, max_loess_jitter: float = 1e-06, batch_key_func: Callable[[...], Any] | None = None) DataFrame

Identify and annotate highly variable genes contained in the query results. The API is modelled on ScanPy scanpy.pp.highly_variable_genes API. Results returned will mimic ScanPy results. The only flavor available is the Seurat V3 method, which assumes count data in the X layer.

See https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html#scanpy.pp.highly_variable_genes for more information on this method.

Parameters:
  • query – A tiledbsoma.ExperimentAxisQuery, specifying the obs/var selection over which genes are annotated.

  • n_top_genes – Number of genes to rank.

  • layer – X layer used, e.g., "raw".

  • flavor – Method used to annotate genes. Must be "seurat_v3".

  • span – If flavor="seurat_v3", the fraction of obs/cells used to estimate the LOESS variance model fit.

  • batch_key – If specified, gene selection will be done by batch and combined. Specify the obs column name, or list of column names, identifying the batches. If not specified, all gene selection is done as a single batch. If multiple batch keys are specified, and no batch_key_func is specified, the batch key will be generated by converting values to string and concatenating them.

  • max_lowess_jitter – The maximum jitter to add to data in case of LOESS failure (can occur when dataset has low entry counts.)

  • batch_key_func – Optional function to create a user-defined batch key. Function will be called once per row in the obs dataframe. Function will receive a single argument: a pandas.Series containing values specified in the``batch_key`` argument.

Returns:

A pandas.DataFrame containing annotations for all var values specified by the query argument. Annotations are identical to those produced by scanpy.pp.highly_variable_genes().

Raises:

ValueError – if the flavor parameter is not "seurat_v3".

Examples

Fetch pandas.DataFrame containing var annotations for the query selection, using "dataset_id" as batch_key.

>>> hvg = highly_variable_genes(query, batch_key="dataset_id")

Fetch highly variable genes, using the concatenation of "dataset_id" and "donor_id" as batch_key:

>>> hvg = highly_variable_genes(query, batch_key=["dataset_id", "donor_id"])

Fetch highly variable genes, with a user-defined batch_key_func:

>>> hvg = highly_variable_genes(
        query,
        batch_key="donor_id",
        batch_key_func=lambda s: return "batch0" if s.donor_id == "99" else "batch1"
    )

Lifecycle

experimental