czbenchmarks.metrics.utils ========================== .. py:module:: czbenchmarks.metrics.utils Functions --------- .. autoapisummary:: czbenchmarks.metrics.utils.nearest_neighbors_hnsw czbenchmarks.metrics.utils.compute_entropy_per_cell czbenchmarks.metrics.utils.jaccard_score czbenchmarks.metrics.utils.mean_fold_metric Module Contents --------------- .. py:function:: nearest_neighbors_hnsw(data: numpy.ndarray, expansion_factor: int = 200, max_links: int = 48, n_neighbors: int = 100) -> tuple[numpy.ndarray, numpy.ndarray] Find nearest neighbors using HNSW algorithm. :param data: Input data matrix of shape (n_samples, n_features) :param expansion_factor: Size of dynamic candidate list for search :param max_links: Number of bi-directional links created for every new element :param n_neighbors: Number of nearest neighbors to find :returns: - Indices array of shape (n_samples, n_neighbors) - Distances array of shape (n_samples, n_neighbors) :rtype: Tuple containing .. py:function:: compute_entropy_per_cell(X: numpy.ndarray, labels: Union[pandas.Categorical, pandas.Series, numpy.ndarray]) -> numpy.ndarray Compute entropy of batch labels in local neighborhoods. For each cell, finds nearest neighbors and computes entropy of batch label distribution in that neighborhood. :param X: Cell embedding matrix of shape (n_cells, n_features) :param labels: Series containing batch labels for each cell :returns: Array of entropy values for each cell, normalized by log of number of batches .. py:function:: jaccard_score(y_true: set[str], y_pred: set[str]) Compute Jaccard similarity between true and predicted values. :param y_true: True values :param y_pred: Predicted values .. py:function:: mean_fold_metric(results_df, metric='accuracy', classifier=None) Compute mean of a metric across folds. :param results_df: DataFrame containing cross-validation results. Must have columns: - "classifier": Name of the classifier (e.g., "lr", "knn") - One of the following metric columns: - "accuracy": For accuracy scores - "f1": For F1 scores - "precision": For precision scores - "recall": For recall scores :param metric: Name of metric column to average ("accuracy", "f1", etc.) :param classifier: Optional classifier name to filter results :returns: Mean value of the metric across folds :raises KeyError: If the specified metric column is not present in results_df