czbenchmarks.tasks.clustering

Attributes

logger

Classes

ClusteringTask

Task for evaluating clustering performance against ground truth labels.

Module Contents

czbenchmarks.tasks.clustering.logger
class czbenchmarks.tasks.clustering.ClusteringTask(label_key: str, random_seed: int = RANDOM_SEED, n_iterations: int = N_ITERATIONS, flavor: str = FLAVOR, key_added: str = KEY_ADDED)[source]

Bases: czbenchmarks.tasks.base.BaseTask

Task for evaluating clustering performance against ground truth labels.

This task performs clustering on embeddings and evaluates the results using multiple clustering metrics (ARI and NMI).

Parameters:
  • label_key (str) – Key to access ground truth labels in metadata

  • random_seed (int) – Random seed for reproducibility

label_key
random_seed = 42
n_iterations = 2
flavor = 'igraph'
key_added = 'leiden'
property display_name: str

A pretty name to use when displaying task results

property required_inputs: Set[czbenchmarks.datasets.DataType]

Required input data types.

Returns:

Set of required input DataTypes (metadata with labels)

property required_outputs: Set[czbenchmarks.datasets.DataType]

Required output data types.

Returns:

required output types from models this task to run (embedding to cluster)