czbenchmarks.tasks.base

Classes

BaseTask

Abstract base class for all benchmark tasks.

Module Contents

class czbenchmarks.tasks.base.BaseTask(*, random_seed: int = RANDOM_SEED)[source]

Bases: abc.ABC

Abstract base class for all benchmark tasks.

Defines the interface that all tasks must implement. Tasks are responsible for: 1. Declaring their required input/output data types 2. Running task-specific computations 3. Computing evaluation metrics

Tasks should store any intermediate results as instance variables to be used in metric computation.

Parameters:: random_seed (int) – Random seed for reproducibility

random_seed = 42

property display_name: str

Abstractmethod:

A pretty name to use when displaying task results

property required_inputs: Set[czbenchmarks.datasets.DataType]

Abstractmethod:

Required input data types this task requires.

Returns:: Set of DataType enums that must be present in input data

property required_outputs: Set[czbenchmarks.datasets.DataType]

Abstractmethod:

Required output types from models this task requires

Returns:: Set of DataType enums that must be present in output data

property requires_multiple_datasets: bool: Whether this task requires multiple datasets

validate(data: czbenchmarks.datasets.BaseDataset)[source]

set_baseline(data: czbenchmarks.datasets.BaseDataset, **kwargs)[source]

Set a baseline embedding using PCA on gene expression data.

This method performs standard preprocessing on the raw gene expression data and uses PCA for dimensionality reduction. It then sets the PCA embedding as the BASELINE model output in the dataset, which can be used for comparison with other model embeddings.

Parameters:

data – BaseDataset containing AnnData with gene expression data
**kwargs – Additional arguments passed to run_standard_scrna_workflow

run(data: czbenchmarks.datasets.BaseDataset | List[czbenchmarks.datasets.BaseDataset], model_types: List[czbenchmarks.models.types.ModelType] | None = None) → Dict[czbenchmarks.models.types.ModelType, List[czbenchmarks.metrics.types.MetricResult]] | List[Dict[czbenchmarks.models.types.ModelType, List[czbenchmarks.metrics.types.MetricResult]]][source]

Run the task on input data and compute metrics.

Parameters:

data – Single dataset or list of datasets to evaluate. Must contain required input and output data types.

Returns:

Dictionary of model types to metric results For multiple datasets: List of metric dictionaries, one per dataset

Return type:

For single dataset

Raises:

ValueError – If data is invalid type or missing required fields
ValueError – If task requires multiple datasets but single dataset provided