czbenchmarks.tasks.label_prediction

Attributes

logger

Classes

MetadataLabelPredictionTask

Task for predicting labels from embeddings using cross-validation.

Module Contents

czbenchmarks.tasks.label_prediction.logger
class czbenchmarks.tasks.label_prediction.MetadataLabelPredictionTask(label_key: str, n_folds: int = N_FOLDS, random_seed: int = RANDOM_SEED, min_class_size: int = MIN_CLASS_SIZE)[source]

Bases: czbenchmarks.tasks.base.BaseTask

Task for predicting labels from embeddings using cross-validation.

Evaluates multiple classifiers (Logistic Regression, KNN) using k-fold cross-validation. Reports standard classification metrics.

Parameters:
  • label_key – Key to access ground truth labels in metadata

  • n_folds – Number of cross-validation folds

  • random_seed – Random seed for reproducibility

  • min_class_size – Minimum samples required per class

label_key
n_folds = 5
random_seed = 42
min_class_size = 10
property display_name: str

A pretty name to use when displaying task results

property required_inputs: Set[czbenchmarks.datasets.DataType]

Required input data types.

Returns:

Set of required input DataTypes (metadata with labels)

property required_outputs: Set[czbenchmarks.datasets.DataType]

Required output data types.

Returns:

required output types from models this task to run (embedding coordinates)

set_baseline(data: czbenchmarks.datasets.BaseDataset)[source]

Set a baseline embedding using raw gene expression.

Instead of using embeddings from a model, this method uses the raw gene expression matrix as features for classification. This provides a baseline performance to compare against model-generated embeddings for classification tasks.

Parameters:

data – BaseDataset containing AnnData with gene expression and metadata