czbenchmarks.tasks.utils ======================== .. py:module:: czbenchmarks.tasks.utils Attributes ---------- .. autoapisummary:: czbenchmarks.tasks.utils.logger czbenchmarks.tasks.utils.MULTI_DATASET_TASK_NAMES czbenchmarks.tasks.utils.TASK_NAMES Functions --------- .. autoapisummary:: czbenchmarks.tasks.utils.cluster_embedding czbenchmarks.tasks.utils.filter_minimum_class czbenchmarks.tasks.utils.run_standard_scrna_workflow Module Contents --------------- .. py:data:: logger .. py:data:: MULTI_DATASET_TASK_NAMES .. py:data:: TASK_NAMES .. py:function:: cluster_embedding(adata: anndata.AnnData, obsm_key: str = OBSM_KEY, random_seed: int = RANDOM_SEED, n_iterations: int = 2, flavor: str = FLAVOR, key_added: str = KEY_ADDED) -> List[int] Cluster cells in embedding space using the Leiden algorithm. Computes nearest neighbors in the embedding space and runs the Leiden community detection algorithm to identify clusters. :param adata: AnnData object containing the embedding :param obsm_key: Key in adata.obsm containing the embedding coordinates :param random_seed: Random seed for reproducibility :param n_iterations: Number of iterations for the Leiden algorithm :param flavor: Flavor of the Leiden algorithm :param key_added: Key in adata.obs to store the cluster assignments :returns: List of cluster assignments as integers .. py:function:: filter_minimum_class(features: numpy.ndarray, labels: numpy.ndarray | pandas.Series, min_class_size: int = 10) -> tuple[numpy.ndarray, numpy.ndarray | pandas.Series] Filter data to remove classes with too few samples. Removes classes that have fewer samples than the minimum threshold. Useful for ensuring enough samples per class for ML tasks. :param features: Feature matrix of shape (n_samples, n_features) :param labels: Labels array of shape (n_samples,) :param min_class_size: Minimum number of samples required per class :returns: - Filtered feature matrix - Filtered labels as categorical data :rtype: Tuple containing .. py:function:: run_standard_scrna_workflow(adata: anndata.AnnData, n_top_genes: int = 3000, n_pcs: int = 50, random_state: int = 42) -> anndata.AnnData Run a standard preprocessing workflow for single-cell RNA-seq data. This function performs common preprocessing steps for scRNA-seq analysis: 1. Normalization of counts per cell 2. Log transformation 3. Identification of highly variable genes 4. Subsetting to highly variable genes 5. Principal component analysis :param adata: AnnData object containing the raw count data :param n_top_genes: Number of highly variable genes to select :param n_pcs: Number of principal components to compute :param random_state: Random seed for reproducibility