czbenchmarks.tasks.task

Attributes

TASK_REGISTRY

Classes

TaskInput

Base class for task inputs.

TaskOutput

Base class for task outputs.

TaskParameter

Schema for a single, discoverable parameter.

TaskInfo

Schema for all discoverable information about a single benchmark task.

TaskRegistry

A registry that is populated automatically as Task subclasses are defined.

Task

Abstract base class for all benchmark tasks.

Module Contents

class czbenchmarks.tasks.task.TaskInput(/, **data: Any)[source]

Bases: pydantic.BaseModel

Base class for task inputs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class czbenchmarks.tasks.task.TaskOutput(/, **data: Any)[source]

Bases: pydantic.BaseModel

Base class for task outputs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class czbenchmarks.tasks.task.TaskParameter(/, **data: Any)[source]

Bases: pydantic.BaseModel

Schema for a single, discoverable parameter.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

type: Any
stringified_type: str
default: Any = None
required: bool
class czbenchmarks.tasks.task.TaskInfo(/, **data: Any)[source]

Bases: pydantic.BaseModel

Schema for all discoverable information about a single benchmark task.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

name: str
display_name: str
description: str
task_params: Dict[str, TaskParameter]
baseline_params: Dict[str, TaskParameter]
class czbenchmarks.tasks.task.TaskRegistry[source]

A registry that is populated automatically as Task subclasses are defined.

register_task(task_class: type[Task])[source]

Registers a task class and introspects it to gather metadata.

list_tasks() List[str][source]

Returns a list of all available task names.

get_task_info(task_name: str) TaskInfo[source]

Gets all introspected information for a given task.

get_task_class(task_name: str) Type[Task][source]

Gets the class for a given task name.

get_task_help(task_name: str) str[source]

Generate detailed help text for a specific task.

validate_task_input(task_name: str, parameters: Dict[str, Any]) None[source]

Strictly validate parameters using the Pydantic input model.

validate_task_parameters(task_name: str, parameters: Dict[str, Any]) List[str][source]

Validate parameters for a task and return list of error messages.

czbenchmarks.tasks.task.TASK_REGISTRY
class czbenchmarks.tasks.task.Task(*, random_seed: int = RANDOM_SEED)[source]

Bases: abc.ABC

Abstract base class for all benchmark tasks.

Defines the interface that all tasks must implement. Tasks are responsible for: 1. Declaring their required input/output data types 2. Running task-specific computations 3. Computing evaluation metrics

Tasks should store any intermediate results as instance variables to be used in metric computation.

Parameters:

random_seed (int) – Random seed for reproducibility

random_seed = 42
requires_multiple_datasets = False
classmethod __init_subclass__(**kwargs)[source]

Automatically register task subclasses when they are defined.

compute_baseline(expression_data: czbenchmarks.tasks.types.CellRepresentation, **kwargs) czbenchmarks.tasks.types.CellRepresentation[source]

Set a baseline embedding using PCA on gene expression data.

This method performs standard preprocessing on the raw gene expression data and uses PCA for dimensionality reduction. It then sets the PCA embedding as the BASELINE model output in the dataset, which can be used for comparison with other model embeddings.

Parameters:
  • expression_data – expression data to use for anndata

  • **kwargs – Additional arguments passed to run_standard_scrna_workflow

run(cell_representation: czbenchmarks.tasks.types.CellRepresentation | List[czbenchmarks.tasks.types.CellRepresentation], task_input: TaskInput) List[czbenchmarks.metrics.types.MetricResult][source]

Run the task on input data and compute metrics.

Parameters:
  • cell_representation – gene expression data or embedding to use for the task

  • task_input – Pydantic model with inputs for the task

Returns:

A one-element list containing a single metric result for the task For multiple embeddings: List of metric results for each task, one per dataset

Return type:

For single embedding

Raises:

ValueError – If input does not match multiple embedding requirement