czbenchmarks.tasks.label_prediction
Attributes
Classes
Task for predicting labels from embeddings using cross-validation. |
Module Contents
- czbenchmarks.tasks.label_prediction.logger
- class czbenchmarks.tasks.label_prediction.MetadataLabelPredictionTask(label_key: str, n_folds: int = N_FOLDS, random_seed: int = RANDOM_SEED, min_class_size: int = MIN_CLASS_SIZE)[source]
Bases:
czbenchmarks.tasks.base.BaseTask
Task for predicting labels from embeddings using cross-validation.
Evaluates multiple classifiers (Logistic Regression, KNN) using k-fold cross-validation. Reports standard classification metrics.
- Parameters:
label_key – Key to access ground truth labels in metadata
n_folds – Number of cross-validation folds
random_seed – Random seed for reproducibility
min_class_size – Minimum samples required per class
- label_key
- n_folds = 5
- random_seed = 42
- min_class_size = 10
- property required_inputs: Set[czbenchmarks.datasets.DataType]
Required input data types.
- Returns:
Set of required input DataTypes (metadata with labels)
- property required_outputs: Set[czbenchmarks.datasets.DataType]
Required output data types.
- Returns:
required output types from models this task to run (embedding coordinates)
- set_baseline(data: czbenchmarks.datasets.BaseDataset)[source]
Set a baseline embedding using raw gene expression.
Instead of using embeddings from a model, this method uses the raw gene expression matrix as features for classification. This provides a baseline performance to compare against model-generated embeddings for classification tasks.
- Parameters:
data – BaseDataset containing AnnData with gene expression and metadata