cellxgene_census.experimental.pp.highly_variable_genes

cellxgene_census.experimental.pp.highly_variable_genes(query: ExperimentAxisQuery, n_top_genes: int = 1000, layer: str = 'raw', flavor: Literal['seurat_v3'] = 'seurat_v3', span: float = 0.3, batch_key: str | Sequence[str] | None = None, max_loess_jitter: float = 1e-06, batch_key_func: Callable[[...], Any] | None = None) DataFrame

Identify and annotate highly variable genes contained in the query results. The API is modelled on ScanPy scanpy.pp.highly_variable_genes API. Results returned will mimic ScanPy results. The only flavor available is the Seurat V3 method, which assumes count data in the X layer.

See https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html#scanpy.pp.highly_variable_genes for more information on this method.

Parameters:
  • query – A SOMA query, specifying the obs/var selection over which genes are annotated.

  • n_top_genes – Number of genes to rank.

  • layer – X layer used, e.g., raw

  • flavor – Method used to annotate genes. Must be seurat_v3

  • span – For seurat_v3 flavor, the fraction of obs/cells used to estimate the loess variance model fit.

  • batch_key

    If specified, gene selection will be done by batch and combined. Specify the obs column name, or list of column names, identifying the batches. If not specified, all gene selection is done as a single batch.

    If multiple batch keys are specified, and no batch_key_func is specified, the batch key will be generated by converting values to string and concatenating them.

  • max_lowess_jitter – The maximum jitter to add to data in case of LOESS failure (can occur when dataset has low entry counts.)

  • batch_key_func – Optional function to create a user-defined batch key. Function will be called once per row in the obs dataframe. Function will receive a single argument: a Pandas Series containing values specified in the batch_key argument.

Returns:

Pandas DataFrame containing annotations for all var values specified by the query argument. Annotations are identical to those produced by scanpy.pp.highly_variable_genes

Raises:

ValueError – if the flavor paramater is not seurat_v3.

Examples

Fetch Pandas DataFrame containing var annotations for the query selection, using dataset_id as a batch key.

>>> hvg = highly_variable_genes(query, batch_key="dataset_id")

Fetch highly variable genes, using the concatenation of dataset_id and donor_id a a batch key:

>>> hvg = highly_variable_genes(query, batch_key=["dataset_id", "donor_id"])

Fetch highly variable genes, with a user-defined batch key function:

>>> hvg = highly_variable_genes(
        query,
        batch_key="donor_id",
        batch_key_func=lambda s: return "batch0" if s.donor_id == "99" else "batch1"
    )

Lifecycle

experimental