czbenchmarks.tasks

Submodules

Classes

ClusteringTask

Task for evaluating clustering performance against ground truth labels.

EmbeddingTask

Task for evaluating embedding quality using labeled data.

MetadataLabelPredictionTask

Task for predicting labels from embeddings using cross-validation.

BatchIntegrationTask

Task for evaluating batch integration quality.

PerturbationTask

Task for evaluating perturbation prediction quality.

CrossSpeciesIntegrationTask

Task for evaluating cross-species integration quality.

Package Contents

class czbenchmarks.tasks.ClusteringTask(label_key: str, random_seed: int = RANDOM_SEED, n_iterations: int = N_ITERATIONS, flavor: str = FLAVOR, key_added: str = KEY_ADDED)[source]

Bases: czbenchmarks.tasks.base.BaseTask

Task for evaluating clustering performance against ground truth labels.

This task performs clustering on embeddings and evaluates the results using multiple clustering metrics (ARI and NMI).

Parameters:
  • label_key (str) – Key to access ground truth labels in metadata

  • random_seed (int) – Random seed for reproducibility

label_key
random_seed = 42
n_iterations = 2
flavor = 'igraph'
key_added = 'leiden'
property display_name: str

A pretty name to use when displaying task results

property required_inputs: Set[czbenchmarks.datasets.DataType]

Required input data types.

Returns:

Set of required input DataTypes (metadata with labels)

property required_outputs: Set[czbenchmarks.datasets.DataType]

Required output data types.

Returns:

required output types from models this task to run (embedding to cluster)

class czbenchmarks.tasks.EmbeddingTask(label_key: str)[source]

Bases: czbenchmarks.tasks.base.BaseTask

Task for evaluating embedding quality using labeled data.

This task computes quality metrics for embeddings using ground truth labels. Currently supports silhouette score evaluation.

Parameters:

label_key (str) – Key to access ground truth labels in metadata

label_key
property display_name: str

A pretty name to use when displaying task results

property required_inputs: Set[czbenchmarks.datasets.DataType]

Required input data types.

Returns:

Set of required input DataTypes (metadata with labels)

property required_outputs: Set[czbenchmarks.datasets.DataType]

Required output data types.

Returns:

required output types from models this task to run (embedding coordinates)

class czbenchmarks.tasks.MetadataLabelPredictionTask(label_key: str, n_folds: int = N_FOLDS, random_seed: int = RANDOM_SEED, min_class_size: int = MIN_CLASS_SIZE)[source]

Bases: czbenchmarks.tasks.base.BaseTask

Task for predicting labels from embeddings using cross-validation.

Evaluates multiple classifiers (Logistic Regression, KNN) using k-fold cross-validation. Reports standard classification metrics.

Parameters:
  • label_key – Key to access ground truth labels in metadata

  • n_folds – Number of cross-validation folds

  • random_seed – Random seed for reproducibility

  • min_class_size – Minimum samples required per class

label_key
n_folds = 5
random_seed = 42
min_class_size = 10
property display_name: str

A pretty name to use when displaying task results

property required_inputs: Set[czbenchmarks.datasets.DataType]

Required input data types.

Returns:

Set of required input DataTypes (metadata with labels)

property required_outputs: Set[czbenchmarks.datasets.DataType]

Required output data types.

Returns:

required output types from models this task to run (embedding coordinates)

set_baseline(data: czbenchmarks.datasets.BaseDataset)[source]

Set a baseline embedding using raw gene expression.

Instead of using embeddings from a model, this method uses the raw gene expression matrix as features for classification. This provides a baseline performance to compare against model-generated embeddings for classification tasks.

Parameters:

data – BaseDataset containing AnnData with gene expression and metadata

class czbenchmarks.tasks.BatchIntegrationTask(label_key: str, batch_key: str)[source]

Bases: czbenchmarks.tasks.base.BaseTask

Task for evaluating batch integration quality.

This task computes metrics to assess how well different batches are integrated in the embedding space while preserving biological signals.

Parameters:
  • label_key – Key to access ground truth cell type labels in metadata

  • batch_key – Key to access batch labels in metadata

label_key
batch_key
property display_name: str

A pretty name to use when displaying task results

property required_inputs: Set[czbenchmarks.datasets.DataType]

Required input data types.

Returns:

Set of required input DataTypes (metadata with labels)

property required_outputs: Set[czbenchmarks.datasets.DataType]

Required output data types.

Returns:

required output types from models this task to run (embedding coordinates)

class czbenchmarks.tasks.PerturbationTask[source]

Bases: czbenchmarks.tasks.base.BaseTask

Task for evaluating perturbation prediction quality.

This task computes metrics to assess how well a model predicts gene expression changes in response to perturbations. Compares predicted vs ground truth perturbation effects using MSE and correlation metrics.

property display_name: str

A pretty name to use when displaying task results

property required_inputs: Set[czbenchmarks.datasets.DataType]

Required input data types.

Returns:

Set of required input DataTypes (ground truth perturbation effects)

property required_outputs: Set[czbenchmarks.datasets.DataType]

Required output data types.

Returns:

required output types from models this task to run (predicted perturbation effects)

set_baseline(data: czbenchmarks.datasets.PerturbationSingleCellDataset, gene_pert: str, baseline_type: Literal['median', 'mean'] = 'median', **kwargs)[source]

Set a baseline embedding for perturbation prediction.

Creates baseline predictions using simple statistical methods (median and mean) applied to the control data, and evaluates these predictions against ground truth.

Parameters:
  • data – PerturbationSingleCellDataset containing control and perturbed data

  • gene_pert – The perturbation gene to evaluate

  • baseline_type – The statistical method to use for baseline prediction (median or mean)

  • **kwargs – Additional arguments passed to the evaluation

Returns:

List of MetricResult objects containing baseline performance metrics for different statistical methods (median, mean)

class czbenchmarks.tasks.CrossSpeciesIntegrationTask(label_key: str)[source]

Bases: czbenchmarks.tasks.base.BaseTask

Task for evaluating cross-species integration quality.

This task computes metrics to assess how well different species’ data are integrated in the embedding space while preserving biological signals. It operates on multiple datasets from different species.

Parameters:

label_key – Key to access ground truth cell type labels in metadata

label_key
property display_name: str

A pretty name to use when displaying task results

property required_inputs: Set[czbenchmarks.datasets.DataType]

Required input data types.

Returns:

Set of required input DataTypes (metadata with labels)

property required_outputs: Set[czbenchmarks.datasets.DataType]

Required output data types.

Returns:

required output types from models this task to run (embedding coordinates)

property requires_multiple_datasets: bool

Whether this task requires multiple datasets.

Returns:

True as this task compares data across species

abstract set_baseline(data: List[czbenchmarks.datasets.SingleCellDataset], **kwargs)[source]

Set a baseline embedding for cross-species integration.

This method is not implemented for cross-species integration tasks as standard preprocessing workflows are not directly applicable across different species.

Parameters:
  • data – List of SingleCellDataset objects from different species

  • **kwargs – Additional arguments passed to run_standard_scrna_workflow

Raises:

NotImplementedError – Always raised as baseline is not implemented