czbenchmarks.metrics.types

Classes

MetricType

Enumeration of all supported metric types.

MetricInfo

Stores metadata about a metric.

MetricRegistry

Central registry for all available metrics.

MetricResult

Represents the result of a single metric computation.

AggregatedMetricResult

Represents the aggregated result of multiple metric computations.

Module Contents

class czbenchmarks.metrics.types.MetricType(*args, **kwds)[source]

Bases: enum.Enum

Enumeration of all supported metric types.

Defines unique identifiers for evaluation metrics that can be computed. Each metric type corresponds to a specific evaluation metric, and its value is used as a string identifier in results dictionaries.

Examples

  • Clustering metrics: Adjusted Rand Index, Normalized Mutual Information

  • Embedding quality metrics: Silhouette Score

  • Integration metrics: Entropy Per Cell, Batch Silhouette

  • Perturbation metrics: Mean Squared Error, Pearson Correlation

ADJUSTED_RAND_INDEX = 'adjusted_rand_index'
NORMALIZED_MUTUAL_INFO = 'normalized_mutual_info'
SILHOUETTE_SCORE = 'silhouette_score'
ENTROPY_PER_CELL = 'entropy_per_cell'
BATCH_SILHOUETTE = 'batch_silhouette'
MEAN_SQUARED_ERROR = 'mean_squared_error'
PEARSON_CORRELATION = 'PEARSON_CORRELATION'
ACCURACY = 'accuracy'
ACCURACY_CALCULATION = 'accuracy_calculation'
MEAN_FOLD_ACCURACY = 'mean_fold_accuracy'
AUROC = 'auroc'
MEAN_FOLD_AUROC = 'mean_fold_auroc'
F1_SCORE = 'f1'
F1_CALCULATION = 'f1_calculation'
MEAN_FOLD_F1_SCORE = 'mean_fold_f1'
JACCARD = 'jaccard'
PRECISION = 'precision'
PRECISION_CALCULATION = 'precision_calculation'
MEAN_FOLD_PRECISION = 'mean_fold_precision'
RECALL = 'recall'
RECALL_CALCULATION = 'recall_calculation'
MEAN_FOLD_RECALL = 'mean_fold_recall'
SPEARMAN_CORRELATION_CALCULATION = 'spearman_correlation_calculation'
SEQUENTIAL_ALIGNMENT = 'sequential_alignment'
class czbenchmarks.metrics.types.MetricInfo(/, **data: Any)[source]

Bases: pydantic.BaseModel

Stores metadata about a metric.

Encapsulates information required for metric computation, including: - The function implementing the metric. - Required arguments for the metric function. - Default parameters for the metric function. - An optional description of the metric’s purpose. - Tags for grouping related metrics.

func

The function that computes the metric.

Type:

Callable

required_args

Names of required arguments for the metric function.

Type:

Set[str]

default_params

Default parameters for the metric function.

Type:

Dict[str, Any]

description

Documentation string describing the metric.

Type:

Optional[str]

tags

Tags for categorizing metrics.

Type:

Set[str]

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

func: Callable

The function that computes the metric

required_args: Set[str]

Set of required argument names

default_params: Dict[str, Any]

Default parameters for the metric function

description: str | None = None

Optional documentation string for custom metrics

tags: Set[str] = None

Set of tags for grouping related metrics

class czbenchmarks.metrics.types.MetricRegistry[source]

Central registry for all available metrics.

Provides functionality for registering, validating, and computing metrics. Each metric is associated with a unique MetricType identifier and metadata stored in a MetricInfo object.

Features: - Register new metrics with required arguments, default parameters, and tags. - Compute metrics by passing required arguments and merging with defaults. - Retrieve metadata about registered metrics. - List available metrics, optionally filtered by tags.

_metrics

Internal storage for registered metrics.

Type:

Dict[MetricType, MetricInfo]

register(metric_type: MetricType, func: Callable, required_args: Set[str] | None = None, default_params: Dict[str, Any] | None = None, description: str = '', tags: Set[str] | None = None) None[source]

Register a new metric in the registry.

Associates a metric type with its computation function, required arguments, default parameters, and metadata. Registered metrics can later be computed using the compute method.

Parameters:
  • metric_type (MetricType) – Unique identifier for the metric.

  • func (Callable) – Function that computes the metric.

  • required_args (Optional[Set[str]]) – Names of required arguments for the metric function.

  • default_params (Optional[Dict[str, Any]]) – Default parameters for the metric function.

  • description (str) – Documentation string describing the metric’s purpose.

  • tags (Optional[Set[str]]) – Tags for categorizing the metric.

Raises:

TypeError – If metric_type is not an instance of MetricType.

compute(metric_type: MetricType, **kwargs) float[source]

Compute a registered metric with the given parameters.

Validates required arguments and merges them with default parameters before calling the metric’s computation function.

Parameters:
  • metric_type (MetricType) – Type of metric to compute.

  • **kwargs – Arguments to pass to the metric function.

Returns:

Computed metric value.

Return type:

float

Raises:

ValueError – If the metric type is unknown or required arguments are missing.

get_info(metric_type: MetricType) MetricInfo[source]

Get metadata about a metric.

Parameters:

metric_type – Type of metric

Returns:

MetricInfo object with metric metadata

Raises:

ValueError – If metric type unknown

list_metrics(tags: Set[str] | None = None) Set[MetricType][source]

List available metrics, optionally filtered by tags.

Retrieves all registered metrics, or filters them based on the provided tags.

Parameters:

tags (Optional[Set[str]]) – Tags to filter metrics. Only metrics with all specified tags will be returned.

Returns:

Set of matching metric types.

Return type:

Set[MetricType]

class czbenchmarks.metrics.types.MetricResult(/, **data: Any)[source]

Bases: pydantic.BaseModel

Represents the result of a single metric computation.

Encapsulates the computed value, associated metric type, and any parameters used during computation. Provides functionality for generating aggregation keys to group similar metrics.

metric_type

The type of metric computed.

Type:

MetricType

value

The computed metric value.

Type:

float

params

Parameters used during computation.

Type:

Optional[Dict[str, Any]]

aggregation_key()

Generates a key based on the metric type and parameters to aggregate similar metrics together.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

metric_type: MetricType
value: float
params: Dict[str, Any] | None = None
property aggregation_key: str

return a key based on the metric type and params in order to aggregate the same metrics together

class czbenchmarks.metrics.types.AggregatedMetricResult(/, **data: Any)[source]

Bases: pydantic.BaseModel

Represents the aggregated result of multiple metric computations.

Stores statistical information about a set of metric values, including the mean, standard deviation, and raw values. Useful for summarizing metrics computed across multiple runs or folds.

metric_type

The type of metric being aggregated.

Type:

MetricType

params

Parameters used during computation.

Type:

Dict[str, Any] | None

n_values

Number of values aggregated.

Type:

int

value

Mean value of the aggregated metrics.

Type:

float

value_std_dev

Standard deviation of the aggregated metrics.

Type:

float | None

values_raw

Raw values of the metrics being aggregated.

Type:

list[float]

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

metric_type: MetricType
params: Dict[str, Any] | None = None
n_values: int
value: float
value_std_dev: float | None
values_raw: list[float]