Add a Custom Task
This guide explains how to create and integrate your own evaluation task into cz-benchmarks.
Steps to Add a New Task
1. Create a New Task Class
Extend the
BaseTask
class to define your custom task.Create a new Python file in the
czbenchmarks/tasks/
directory (or in a subdirectory if the task is modality-specific).Example:
from typing import List, Set from czbenchmarks.tasks.base import BaseTask from czbenchmarks.datasets import DataType from czbenchmarks.metrics.types import MetricResult from czbenchmarks.models.types import ModelType class MyTask(BaseTask): def __init__(self, my_param: int = 10): self.my_param = my_param @property def display_name(self) -> str: return "My Example Task" @property def required_inputs(self) -> Set[DataType]: return {DataType.METADATA} @property def required_outputs(self) -> Set[DataType]: return {DataType.EMBEDDING} def _run_task(self, data, model_type: ModelType): # Retrieve necessary data self.embedding = data.get_output(model_type, DataType.EMBEDDING) self.labels = data.get_input(DataType.METADATA)["cell_type"] # Implement your task logic here # e.g., calculate some predictions or transform embeddings def _compute_metrics(self) -> List[MetricResult]: # Compute and return a list of MetricResult objects metric_value = ... # Replace with actual computation return [MetricResult(metric_type="my_metric", value=metric_value)]
Consult
czbenchmarks/tasks/base.py
for interface details and best practices.Use
display_name
to provide a human-readable name for use when displaying your taskUse the
required_inputs
andrequired_outputs
properties to specify the data types your task needs.required_inputs
: Data your task consumes (e.g.,DataType.METADATA
).required_outputs
: Data your task produces or depends on (e.g.,DataType.EMBEDDING
).
Use
data.get_input()
anddata.get_output()
to retrieve the necessary inputs and outputs.Implement the
_run_task
method to define the core logic of your task.Use the
_compute_metrics
method to calculate and return metrics as a list ofMetricResult
objects. If your task requires aMetric
that is not already supported bycz-benchmarks
, refer to Adding a New Metric.
5. Register the Task
In the
src.czbenchmarks.tasks.utils.TASK_NAMES
array, add a new value for your task.
Best Practices
Single Responsibility: Ensure your task has a clear and focused purpose.
Error Handling: Include robust error handling and logging.
Code Quality: Use type hints and follow the project’s coding style.
Testing: Write unit tests to validate your task’s functionality.