Visualizing Benchmark Results

This guide explains how to visualize and interpret benchmark results stored in JSON or YAML formats. It includes an example of using Python to load and plot the results, as well as generating UMAP visualizations for embeddings.

Overview

Benchmark results are JSON, and include metrics like Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), and others. This guide demonstrates how to process these results for visualization and generate UMAP embeddings for deeper insights.

Example JSON Output

Here is an example of the JSON structure for benchmark results:

{
    "Example_Model_A": [
        {
            "metric_type": "adjusted_rand_index",
            "value": -4.2435515076099105e-05,
            "params": {}
        },
        {
            "metric_type": "normalized_mutual_info",
            "value": 0.01401707774484544,
            "params": {}
        }
    ]
}

Visualizing Results with Python

You can use Python libraries like json, pandas, and matplotlib to load and visualize the results.

Example Code for Benchmark Metrics

The following code demonstrates how to load benchmark results from a JSON file and plot a bar chart for a specific metric (e.g., Adjusted Rand Index):

import json
import pandas as pd
import matplotlib.pyplot as plt

# Load results from a JSON file
with open("results.json") as f:
    results = json.load(f)

# Flatten the results into a DataFrame
data = []
for model_name, metrics in results.items():
    for metric in metrics:
        data.append({
            "model_name": model_name,
            "metric_type": metric["metric_type"],
            "value": metric["value"]
        })

df = pd.DataFrame(data)

# Filter for a specific metric (e.g., Adjusted Rand Index)
ari_df = df[df['metric_type'] == "adjusted_rand_index"]

# Plot the results
plt.figure(figsize=(10, 6))
plt.bar(ari_df['model_name'], ari_df['value'], color='skyblue')
plt.xlabel("Model")
plt.ylabel("Adjusted Rand Index")
plt.title("Clustering Performance (ARI)")
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

Generating UMAP Visualizations

UMAP (Uniform Manifold Approximation and Projection) is a dimensionality reduction technique that can be used to visualize high-dimensional embeddings. The following code demonstrates how to generate UMAP visualizations for embeddings stored in a dataset object:

import scanpy as sc

# Assuming `dataset` is an object with an AnnData attribute `adata`
# and a method `get_output` to retrieve the embedding
adata = dataset.adata

# Add the embedding to the AnnData object
adata.obsm['X_emb'] = dataset.get_output(...)  # Replace `...` with the appropriate method arguments

# Compute the neighborhood graph using the embedding
sc.pp.neighbors(adata, use_rep='X_emb')

# Compute UMAP
sc.tl.umap(adata)

# Plot the UMAP
sc.pl.umap(adata, color=['batch', 'cell_type'], title="UMAP Visualization")

Tips for Customization

Replace color=['batch', 'cell_type'] with other metadata fields available in your AnnData object.
Adjust UMAP parameters (e.g., n_neighbors, min_dist) in sc.pp.neighbors for different visualization effects.