czbenchmarks.tasks.clustering

Attributes

`ClusteringTaskInput`	Base class for task inputs.
`ClusteringOutput`	Output for clustering task.
`ClusteringTask`	Task for evaluating clustering performance against ground truth labels.

class czbenchmarks.tasks.clustering.ClusteringTaskInput(/, **data: Any)[source]

Base class for task inputs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

obs: Annotated[pandas.DataFrame, Field(description='Cell metadata DataFrame (e.g. the `obs` from an AnnData object).')]

input_labels: Annotated[czbenchmarks.types.ListLike, Field(description='Ground truth labels for metric calculation (e.g. `obs.cell_type` from an AnnData object).')]

use_rep: Annotated[str, Field(description="Data representation to use for clustering (e.g. the 'X' or obsm['X_pca'] from an AnnData object).")] = 'X'

n_iterations: Annotated[int, Field(description='Number of iterations for the Leiden algorithm.')] = 2

flavor: Annotated[Literal['leidenalg', 'igraph'], Field(description='Algorithm for Leiden community detection.')] = 'igraph'

key_added: Annotated[str, Field(description='Key in AnnData.obs where cluster assignments are stored.')] = 'leiden'

class czbenchmarks.tasks.clustering.ClusteringOutput(/, **data: Any)[source]

Output for clustering task.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

class czbenchmarks.tasks.clustering.ClusteringTask(*, random_seed: int = RANDOM_SEED)[source]

Task for evaluating clustering performance against ground truth labels.

This task performs clustering on embeddings and evaluates the results using multiple clustering metrics (ARI and NMI).

description = 'Evaluate clustering performance against ground truth labels using ARI and NMI metrics.'