czbenchmarks.tasks.task

Attributes

`logger`
`TASK_REGISTRY`

Classes

`BaselineInput`	Base class for baseline inputs.
`PCABaselineInput`	Input for the standard PCA baseline workflow.
`NoBaselineInput`	A model to signify that no baseline is available for a task.
`TaskInput`	Base class for task inputs.
`TaskOutput`	Base class for task outputs.
`TaskParameter`	Schema for a single, discoverable parameter, including help text and list support.
`TaskInfo`	Schema for all discoverable information about a single benchmark task.
`TaskRegistry`	Production-grade registry for Task subclasses with comprehensive introspection, validation, and CLI support.
`Task`	Abstract base class for all benchmark tasks.

Module Contents

czbenchmarks.tasks.task.logger

class czbenchmarks.tasks.task.BaselineInput(/, **data: Any)[source]

Bases: pydantic.BaseModel

Base class for baseline inputs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class czbenchmarks.tasks.task.PCABaselineInput(/, **data: Any)[source]

Bases: BaselineInput

Input for the standard PCA baseline workflow.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

n_top_genes: int = None

n_pcs: int = None

obsm_key: str = None

class czbenchmarks.tasks.task.NoBaselineInput(/, **data: Any)[source]

Bases: BaselineInput

A model to signify that no baseline is available for a task.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

class czbenchmarks.tasks.task.TaskInput(/, **data: Any)[source]

Bases: pydantic.BaseModel

Base class for task inputs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class czbenchmarks.tasks.task.TaskOutput(/, **data: Any)[source]

Bases: pydantic.BaseModel

Base class for task outputs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class czbenchmarks.tasks.task.TaskParameter(/, **data: Any)[source]

Bases: pydantic.BaseModel

Schema for a single, discoverable parameter, including help text and list support.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

name: str

type: Any

stringified_type: str

default: Any = None

required: bool

help_text: str

is_multiple: bool

model_config: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class czbenchmarks.tasks.task.TaskInfo(/, **data: Any)[source]

Bases: pydantic.BaseModel

Schema for all discoverable information about a single benchmark task.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

name: str

display_name: str

description: str

task_params: Dict[str, TaskParameter]

baseline_params: Dict[str, TaskParameter]

requires_multiple_datasets: bool

class czbenchmarks.tasks.task.TaskRegistry[source]

Production-grade registry for Task subclasses with comprehensive introspection, validation, and CLI support.

This registry provides: - Automatic task discovery and registration - Rich parameter introspection for both Pydantic and function-based tasks - Multi-dataset task validation - CLI-friendly help text generation - Unified validation interface for external programs

register_task(task_class: type[Task]) → None[source]

Register a Task class and cache its metadata for efficient access.

Parameters:: task_class – The Task subclass to register

list_tasks() → List[str][source]

Return a sorted list of all available task keys.

Returns:: List of task keys that can be used to get task info or classes

get_task_info(task_name: str) → TaskInfo[source]

Get all introspected information for a given task.

Parameters:: task_name – The task key (lowercase display name with underscores)
Returns:: TaskInfo object containing all task metadata
Raises:: ValueError – If the task is not found

get_task_class(task_name: str) → Type[Task][source]

Get the Task class for a given task name.

Parameters:: task_name – The task key (lowercase display name with underscores)
Returns:: The Task class
Raises:: ValueError – If the task is not found

get_task_help(task_name: str) → str[source]

Generate a human-readable summary string of a task’s parameters.

Perfect for CLI help text generation.

Parameters:: task_name – The task key to generate help for
Returns:: Formatted help text string with task description and all parameters

validate_task_inputs(task_name: str, params: Dict[str, Any]) → TaskInput | Dict[source]

Validate and build task input parameters.

Returns a Pydantic instance if the task uses Pydantic models, otherwise a dict. Performs comprehensive validation including multi-dataset consistency checks.

Parameters:

task_name – The task key
params – Dictionary of parameter values

Returns:

Validated TaskInput instance or dict

Raises:

ValueError – If validation fails

validate_baseline_inputs(task_name: str, params: Dict[str, Any]) → BaselineInput | Dict[source]

Validate and build baseline input parameters.

Returns a Pydantic instance if the task uses Pydantic models, otherwise a dict. Performs comprehensive validation including multi-dataset consistency checks.

Parameters:

task_name – The task key
params – Dictionary of parameter values

Returns:

Validated BaselineInput instance or dict

Raises:

ValueError – If validation fails

validate_and_build_inputs(task_name: str, model_type: str, params: Dict[str, Any]) → pydantic.BaseModel | Dict[source]

Unified method to validate and build either task or baseline inputs.

This is a convenience method that routes to the appropriate validation method. Useful for external programs that want a single interface.

Parameters:

task_name – The task key
model_type – Either “input_model” for task inputs or “baseline_model” for baseline inputs
params – Dictionary of parameter values

Returns:

Validated model instance or dict

Raises:

ValueError – If validation fails or model_type is invalid

czbenchmarks.tasks.task.TASK_REGISTRY

class czbenchmarks.tasks.task.Task(*, random_seed: int = RANDOM_SEED)[source]

Bases: abc.ABC

Abstract base class for all benchmark tasks.

Defines the interface that all tasks must implement. Tasks are responsible for: 1. Declaring their required input/output data types 2. Running task-specific computations 3. Computing evaluation metrics

Tasks should store any intermediate results as instance variables to be used in metric computation.

input_model: Type[TaskInput]

baseline_model: Type[BaselineInput]

random_seed = 42

requires_multiple_datasets = False

classmethod __init_subclass__(**kwargs)[source]: Automatically register task subclasses when they are defined.

compute_baseline(expression_data: czbenchmarks.tasks.types.CellRepresentation, baseline_input: PCABaselineInput = None) → czbenchmarks.tasks.types.CellRepresentation[source]: Set a baseline embedding using PCA on gene expression data.

run(cell_representation: czbenchmarks.tasks.types.CellRepresentation | List[czbenchmarks.tasks.types.CellRepresentation], task_input: TaskInput) → List[czbenchmarks.metrics.types.MetricResult][source]

Run the task on input data and compute metrics.

Parameters:

cell_representation – gene expression data or embedding to use for the task
task_input – Pydantic model with inputs for the task

Returns:

A one-element list containing a single metric result for the task For multiple embeddings: List of metric results for each task, one per dataset

Return type:

For single embedding

Raises:

ValueError – If input does not match multiple embedding requirement