# Metrics The `czbenchmarks.metrics` module provides a unified and extensible framework for computing performance metrics across all evaluation tasks. ## Overview At the core of this module is a centralized registry, `MetricRegistry`, which stores all supported metrics. Each metric is registered with a unique type, required arguments, default parameters, a description, and a set of descriptive tags. ### Purpose - Allows tasks to declare and compute metrics in a unified, type-safe, and extensible manner. - Ensures metrics are reproducible and callable via shared interfaces across tasks like clustering, embedding, and label prediction. ## Key Components - [MetricRegistry](../autoapi/czbenchmarks/metrics/types/index) A class that registers and manages metric functions, performs argument validation, and handles invocation. - [MetricType](../autoapi/czbenchmarks/metrics/types/index) An `Enum` defining all supported metric names. Each task refers to `MetricType` members to identify which metrics to compute. - **Tags:** Each metric is tagged with its associated category to allow filtering: - `clustering`: ARI, NMI - `embedding`: Silhouette Score - `integration`: Entropy per Cell, Batch Silhouette - `label_prediction`: Accuracy, F1, Precision, Recall, AUROC - `perturbation`: MSE, R², Jaccard Similarity ## Supported Metrics The following metrics are pre-registered: | **Metric Type** | **Task** | **Description** | |--------------------------|------------------|------------------------------------------------------------------------------------------------------------------| | `adjusted_rand_index` | clustering | Measures the similarity between two clusterings, adjusted for chance. A higher value indicates better alignment. | | `normalized_mutual_info` | clustering | Quantifies the amount of shared information between two clusterings, normalized to ensure comparability. | | `silhouette_score` | embedding | Evaluates how well-separated clusters are in an embedding space. Higher scores indicate better-defined clusters. | | `entropy_per_cell` | integration | Assesses the mixing of batch labels at the single-cell level. Higher entropy indicates better integration. | | `batch_silhouette` | integration | Combines silhouette scoring with batch information to evaluate clustering quality while accounting for batch effects. | | `mean_squared_error` | perturbation | Calculates the average squared difference between predicted and true values, indicating prediction accuracy. | | `r2_score` | perturbation | Measures the proportion of variance in true values explained by predictions. Higher values indicate better predictions. | | `jaccard` | perturbation | Computes the similarity between predicted and true sets of top differentially expressed (DE) genes. | | `mean_fold_accuracy` | label_prediction | Average accuracy across k-fold cross-validation splits, indicating overall classification performance. | | `mean_fold_f1` | label_prediction | Average F1 score across folds, balancing precision and recall for classification tasks. | | `mean_fold_precision` | label_prediction | Average precision across folds, reflecting the proportion of true positives among predicted positives. | | `mean_fold_recall` | label_prediction | Average recall across folds, indicating the proportion of true positives correctly identified. | | `mean_fold_auroc` | label_prediction | Average area under the ROC curve across folds, measuring the ability to distinguish between classes. | ## How to Compute a Metric Use `metrics_registry.compute()` inside your task's `_compute_metrics()` method: ```python from czbenchmarks.metrics.types import MetricType, metrics_registry value = metrics_registry.compute( MetricType.ADJUSTED_RAND_INDEX, labels_true=true_labels, labels_pred=predicted_labels, ) # Wrap in a result object from czbenchmarks.metrics.types import MetricResult result = MetricResult(metric_type=MetricType.ADJUSTED_RAND_INDEX, value=value) ``` ## Adding a Custom Metric To add a new metric to the registry: 1. **Add a new member to the enum:** Edit `MetricType` in `czbenchmarks/metrics/types.py`: ```python class MetricType(Enum): ... MY_CUSTOM_METRIC = "my_custom_metric" ``` 2. **Define the metric function:** ```python def my_custom_metric(y_true, y_pred): # return a float value return float(...) ``` 3. **Register it in the registry:** Add to `czbenchmarks/metrics/implementations.py`: ```python metrics_registry.register( MetricType.MY_CUSTOM_METRIC, func=my_custom_metric, required_args={"y_true", "y_pred"}, default_params={"normalize": True}, description="Description of your custom metric", tags={"my_category"}, ) ``` 4. **Use in your task or CLI:** Now the metric is available for any task to compute. ## Using Metric Tags You can list metrics by category using tags: ```python metrics_registry.list_metrics(tags={"clustering"}) # returns a set of MetricType ``` ## Best Practices When implementing or using metrics, follow these guidelines to ensure consistency and reliability: 1. **Type Safety:** Always use the `MetricType` enum instead of string literals to refer to metrics. This ensures type safety and avoids errors due to typos. 2. **Pure Functions:** Metrics should be **pure functions**, meaning they must not have side effects. This ensures reproducibility and consistency across computations. 3. **Return Types:** All metric functions must return a `float` value to maintain uniformity in results. 4. **Validation:** - Validate inputs manually within your metric function if there are strict assumptions about input shapes or types. - Include required argument validation to ensure the metric function is called with the correct parameters. 5. **Default Parameters:** Use `default_params` only for optional keyword arguments. Avoid using them for required arguments. 6. **Tags:** Assign appropriate tags to metrics for categorization. Tags help in filtering and organizing metrics by their use cases (e.g., `clustering`, `embedding`, `label_prediction`). 7. **Documentation:** - Provide a short and clear `description` for each metric to explain its purpose and usage. - Document all parameters and their expected types or shapes to guide users effectively. ## Related References - [MetricRegistry API](../autoapi/czbenchmarks/metrics/types/index) - [Add New Metric Guide](../how_to_guides/add_new_metric) - [ClusteringTask](../autoapi/czbenchmarks/tasks/clustering/index) - [PerturbationTask](../autoapi/czbenchmarks/tasks/single_cell/perturbation/index)