czbenchmarks.tasks.clustering

Attributes

logger

Classes

ClusteringTaskInput

Base class for task inputs.

ClusteringOutput

Output for clustering task.

ClusteringTask

Task for evaluating clustering performance against ground truth labels.

Module Contents

czbenchmarks.tasks.clustering.logger
class czbenchmarks.tasks.clustering.ClusteringTaskInput(/, **data: Any)[source]

Bases: czbenchmarks.tasks.task.TaskInput

Base class for task inputs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

obs: pandas.DataFrame
input_labels: czbenchmarks.types.ListLike
use_rep: str = 'X'
n_iterations: int = 2
flavor: Literal['leidenalg', 'igraph'] = 'igraph'
key_added: str = 'leiden'
class czbenchmarks.tasks.clustering.ClusteringOutput(/, **data: Any)[source]

Bases: czbenchmarks.tasks.task.TaskOutput

Output for clustering task.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

predicted_labels: List[int]
class czbenchmarks.tasks.clustering.ClusteringTask(*, random_seed: int = RANDOM_SEED)[source]

Bases: czbenchmarks.tasks.task.Task

Task for evaluating clustering performance against ground truth labels.

This task performs clustering on embeddings and evaluates the results using multiple clustering metrics (ARI and NMI).

Parameters:

random_seed (int) – Random seed for reproducibility

display_name = 'Clustering'
description = 'Evaluate clustering performance against ground truth labels using ARI and NMI metrics.'
input_model