czbenchmarks.tasks.clustering

Attributes

logger

Classes

ClusteringTaskInput

Base class for task inputs.

ClusteringOutput

Output for clustering task.

ClusteringTask

Task for evaluating clustering performance against ground truth labels.

Module Contents

czbenchmarks.tasks.clustering.logger
class czbenchmarks.tasks.clustering.ClusteringTaskInput(/, **data: Any)[source]

Bases: czbenchmarks.tasks.task.TaskInput

Base class for task inputs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

obs: Annotated[pandas.DataFrame, Field(description='Cell metadata DataFrame (e.g. the `obs` from an AnnData object).')]
input_labels: Annotated[czbenchmarks.types.ListLike, Field(description='Ground truth labels for metric calculation (e.g. `obs.cell_type` from an AnnData object).')]
use_rep: Annotated[str, Field(description="Data representation to use for clustering (e.g. the 'X' or obsm['X_pca'] from an AnnData object).")] = 'X'
n_iterations: Annotated[int, Field(description='Number of iterations for the Leiden algorithm.')] = 2
flavor: Annotated[Literal['leidenalg', 'igraph'], Field(description='Algorithm for Leiden community detection.')] = 'igraph'
key_added: Annotated[str, Field(description='Key in AnnData.obs where cluster assignments are stored.')] = 'leiden'
class czbenchmarks.tasks.clustering.ClusteringOutput(/, **data: Any)[source]

Bases: czbenchmarks.tasks.task.TaskOutput

Output for clustering task.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

predicted_labels: List[int]
class czbenchmarks.tasks.clustering.ClusteringTask(*, random_seed: int = RANDOM_SEED)[source]

Bases: czbenchmarks.tasks.task.Task

Task for evaluating clustering performance against ground truth labels.

This task performs clustering on embeddings and evaluates the results using multiple clustering metrics (ARI and NMI).

display_name = 'Clustering'
description = 'Evaluate clustering performance against ground truth labels using ARI and NMI metrics.'
input_model
baseline_model