cellxgene_census.experimental.pp.get_highly_variable_genes
- cellxgene_census.experimental.pp.get_highly_variable_genes(census: Collection, organism: str, measurement_name: str = 'RNA', X_name: str = 'raw', obs_value_filter: str | None = None, obs_coords: None | bytes | Slice[bytes] | Sequence[bytes] | float | Slice[float] | Sequence[float] | int | Slice[int] | Sequence[int] | slice | Slice[slice] | Sequence[slice] | str | Slice[str] | Sequence[str] | datetime64 | Slice[datetime64] | Sequence[datetime64] | TimestampType | Slice[TimestampType] | Sequence[TimestampType] | Array | ChunkedArray | ndarray[Any, dtype[integer]] | ndarray[Any, dtype[datetime64]] = None, var_value_filter: str | None = None, var_coords: None | bytes | Slice[bytes] | Sequence[bytes] | float | Slice[float] | Sequence[float] | int | Slice[int] | Sequence[int] | slice | Slice[slice] | Sequence[slice] | str | Slice[str] | Sequence[str] | datetime64 | Slice[datetime64] | Sequence[datetime64] | TimestampType | Slice[TimestampType] | Sequence[TimestampType] | Array | ChunkedArray | ndarray[Any, dtype[integer]] | ndarray[Any, dtype[datetime64]] = None, n_top_genes: int = 1000, flavor: Literal['seurat_v3'] = 'seurat_v3', span: float = 0.3, batch_key: str | Sequence[str] | None = None, max_loess_jitter: float = 1e-06, batch_key_func: Callable[[...], Any] | None = None) DataFrame
- Convience wrapper around - tiledbsoma.Experimentquery and- cellxgene_census.experimental.pp.highly_variable_genes()function, to build and execute a query, and annotate the query result genes (- vardataframe) based upon variability.- Parameters:
- census – The Census object, usually returned by - open_soma().
- organism – The organism to query, usually one of - "Homo sapiens"or- "Mus musculus".
- measurement_name – The measurement object to query. Defaults to - "RNA".
- X_name – The - Xlayer to query. Defaults to- "raw".
- obs_value_filter – Value filter for the - obsmetadata. Value is a filter query written in the SOMA- value_filtersyntax.
- obs_coords – Coordinates for the - obsaxis, which is indexed by the- soma_joinidvalue. May be an- int, a list of- int, or a slice. The default,- None, selects all.
- var_value_filter – Value filter for the - varmetadata. Value is a filter query written in the SOMA- value_filtersyntax.
- var_coords – Coordinates for the - varaxis, which is indexed by the- soma_joinidvalue. May be an- int, a list of- int, or a slice. The default,- None, selects all.
- n_top_genes – Number of genes to rank. 
- flavor – Method used to annotate genes. Must be - "seurat_v3".
- span – If - flavor="seurat_v3", the fraction of obs/cells used to estimate the LOESS variance model fit.
- batch_key – If specified, gene selection will be done by batch and combined. Specify the obs column name, or list of column names, identifying the batches. If not specified, all gene selection is done as a single batch. If multiple batch keys are specified, and no batch_key_func is specified, the batch key will be generated by converting values to string and concatenating them. 
- max_lowess_jitter – The maximum jitter to add to data in case of LOESS failure (can occur when dataset has low entry counts.) 
- batch_key_func – Optional function to create a user-defined batch key. Function will be called once per row in the obs dataframe. Function will receive a single argument: a - pandas.Seriescontaining values specified in the- batch_keyargument.
 
- Returns:
- pandas.DataFramecontaining annotations for all- varvalues specified by the query.
- Raises:
- ValueError – if the flavor paramater is not - "seurat_v3".
 - Examples - Fetch a - pandas.DataFramecontaining var annotations for a subset of the cells matching the- obs_value_filter:- >>> hvg = get_highly_variable_genes( census, organism="Mus musculus", obs_value_filter="is_primary_data == True and tissue_general == 'lung'", n_top_genes = 500 ) - Fetch an - anndata.AnnDatawith top 500 genes:- >>> with cellxgene_census.open_soma(census_version="stable") as census: organism = "mus_musculus" obs_value_filter = "is_primary_data == True and tissue_general == 'lung'" # Get the highly variable genes hvg = cellxgene_census.experimental.pp.get_highly_variable_genes( census, organism=organism, obs_value_filter=obs_value_filter, n_top_genes = 500 ) # Fetch AnnData - all cells matching obs_value_filter, just the HVGs hvg_soma_ids = hvg[hvg.highly_variable].index.values adata = cellxgene_census.get_anndata( census, organism=organism, obs_value_filter=obs_value_filter, var_coords=hvg_soma_ids ) - Lifecycle - experimental