czbenchmarks.metrics.utils
Functions
|
Find nearest neighbors using HNSW algorithm. |
|
Compute entropy of batch labels in local neighborhoods. |
|
Compute Jaccard similarity between true and predicted values. |
|
Compute mean of a metric across folds. |
Module Contents
- czbenchmarks.metrics.utils.nearest_neighbors_hnsw(data: numpy.ndarray, expansion_factor: int = 200, max_links: int = 48, n_neighbors: int = 100) tuple[numpy.ndarray, numpy.ndarray] [source]
Find nearest neighbors using HNSW algorithm.
- Parameters:
data – Input data matrix of shape (n_samples, n_features)
expansion_factor – Size of dynamic candidate list for search
max_links – Number of bi-directional links created for every new element
n_neighbors – Number of nearest neighbors to find
- Returns:
Indices array of shape (n_samples, n_neighbors)
Distances array of shape (n_samples, n_neighbors)
- Return type:
Tuple containing
- czbenchmarks.metrics.utils.compute_entropy_per_cell(X: numpy.ndarray, labels: pandas.Categorical | pandas.Series | numpy.ndarray) numpy.ndarray [source]
Compute entropy of batch labels in local neighborhoods.
For each cell, finds nearest neighbors and computes entropy of batch label distribution in that neighborhood.
- Parameters:
X – Cell embedding matrix of shape (n_cells, n_features)
labels – Series containing batch labels for each cell
- Returns:
Array of entropy values for each cell, normalized by log of number of batches
- czbenchmarks.metrics.utils.jaccard_score(y_true: set[str], y_pred: set[str])[source]
Compute Jaccard similarity between true and predicted values.
- Parameters:
y_true – True values
y_pred – Predicted values
- czbenchmarks.metrics.utils.mean_fold_metric(results_df, metric='accuracy', classifier=None)[source]
Compute mean of a metric across folds.
- Parameters:
results_df –
DataFrame containing cross-validation results. Must have columns: - “classifier”: Name of the classifier (e.g., “lr”, “knn”) - One of the following metric columns:
”accuracy”: For accuracy scores
”f1”: For F1 scores
”precision”: For precision scores
”recall”: For recall scores
metric – Name of metric column to average (“accuracy”, “f1”, etc.)
classifier – Optional classifier name to filter results
- Returns:
Mean value of the metric across folds
- Raises:
KeyError – If the specified metric column is not present in results_df