czbenchmarks.tasks.task
=======================

.. py:module:: czbenchmarks.tasks.task


Attributes
----------

.. autoapisummary::

   czbenchmarks.tasks.task.TASK_REGISTRY


Classes
-------

.. autoapisummary::

   czbenchmarks.tasks.task.TaskInput
   czbenchmarks.tasks.task.TaskOutput
   czbenchmarks.tasks.task.TaskParameter
   czbenchmarks.tasks.task.TaskInfo
   czbenchmarks.tasks.task.TaskRegistry
   czbenchmarks.tasks.task.Task


Module Contents
---------------

.. py:class:: TaskInput(/, **data: Any)

   Bases: :py:obj:`pydantic.BaseModel`


   Base class for task inputs.

   Create a new model by parsing and validating input data from keyword arguments.

   Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
   validated to form a valid model.

   `self` is explicitly positional-only to allow `self` as a field name.


   .. py:attribute:: model_config

      Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict].


.. py:class:: TaskOutput(/, **data: Any)

   Bases: :py:obj:`pydantic.BaseModel`


   Base class for task outputs.

   Create a new model by parsing and validating input data from keyword arguments.

   Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
   validated to form a valid model.

   `self` is explicitly positional-only to allow `self` as a field name.


   .. py:attribute:: model_config

      Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict].


.. py:class:: TaskParameter(/, **data: Any)

   Bases: :py:obj:`pydantic.BaseModel`


   Schema for a single, discoverable parameter.

   Create a new model by parsing and validating input data from keyword arguments.

   Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
   validated to form a valid model.

   `self` is explicitly positional-only to allow `self` as a field name.


   .. py:attribute:: type
      :type:  Any


   .. py:attribute:: stringified_type
      :type:  str


   .. py:attribute:: default
      :type:  Any
      :value: None


   .. py:attribute:: required
      :type:  bool


.. py:class:: TaskInfo(/, **data: Any)

   Bases: :py:obj:`pydantic.BaseModel`


   Schema for all discoverable information about a single benchmark task.

   Create a new model by parsing and validating input data from keyword arguments.

   Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
   validated to form a valid model.

   `self` is explicitly positional-only to allow `self` as a field name.


   .. py:attribute:: name
      :type:  str


   .. py:attribute:: display_name
      :type:  str


   .. py:attribute:: description
      :type:  str


   .. py:attribute:: task_params
      :type:  Dict[str, TaskParameter]


   .. py:attribute:: baseline_params
      :type:  Dict[str, TaskParameter]


.. py:class:: TaskRegistry

   A registry that is populated automatically as Task subclasses are defined.


   .. py:method:: register_task(task_class: type[Task])

      Registers a task class and introspects it to gather metadata.


   .. py:method:: list_tasks() -> List[str]

      Returns a list of all available task names.


   .. py:method:: get_task_info(task_name: str) -> TaskInfo

      Gets all introspected information for a given task.


   .. py:method:: get_task_class(task_name: str) -> Type[Task]

      Gets the class for a given task name.


   .. py:method:: get_task_help(task_name: str) -> str

      Generate detailed help text for a specific task.


   .. py:method:: validate_task_input(task_name: str, parameters: Dict[str, Any]) -> None

      Strictly validate parameters using the Pydantic input model.


   .. py:method:: validate_task_parameters(task_name: str, parameters: Dict[str, Any]) -> List[str]

      Validate parameters for a task and return list of error messages.


.. py:data:: TASK_REGISTRY

.. py:class:: Task(*, random_seed: int = RANDOM_SEED)

   Bases: :py:obj:`abc.ABC`


   Abstract base class for all benchmark tasks.

   Defines the interface that all tasks must implement. Tasks are responsible for:
   1. Declaring their required input/output data types
   2. Running task-specific computations
   3. Computing evaluation metrics

   Tasks should store any intermediate results as instance variables
   to be used in metric computation.

   :param random_seed: Random seed for reproducibility
   :type random_seed: int


   .. py:attribute:: random_seed
      :value: 42


   .. py:attribute:: requires_multiple_datasets
      :value: False


   .. py:method:: __init_subclass__(**kwargs)
      :classmethod:


      Automatically register task subclasses when they are defined.


   .. py:method:: compute_baseline(expression_data: czbenchmarks.tasks.types.CellRepresentation, **kwargs) -> czbenchmarks.tasks.types.CellRepresentation

      Set a baseline embedding using PCA on gene expression data.

      This method performs standard preprocessing on the raw gene expression data
      and uses PCA for dimensionality reduction. It then sets the PCA embedding
      as the BASELINE model output in the dataset, which can be used for comparison
      with other model embeddings.

      :param expression_data: expression data to use for anndata
      :param \*\*kwargs: Additional arguments passed to run_standard_scrna_workflow


   .. py:method:: run(cell_representation: Union[czbenchmarks.tasks.types.CellRepresentation, List[czbenchmarks.tasks.types.CellRepresentation]], task_input: TaskInput) -> List[czbenchmarks.metrics.types.MetricResult]

      Run the task on input data and compute metrics.

      :param cell_representation: gene expression data or embedding to use for the task
      :param task_input: Pydantic model with inputs for the task

      :returns: A one-element list containing a single metric result for the task
                For multiple embeddings: List of metric results for each task, one per dataset
      :rtype: For single embedding

      :raises ValueError: If input does not match multiple embedding requirement