Quick Start Guide

Welcome to cz-benchmarks! This guide will help you get started with installation, setup, and running your first benchmark in just a few steps.

Requirements

Before you begin, ensure you have the following installed:

  • 🐍 Python 3.10+: 3.10+**: Ensure you have Python 3.10 or later installed.

  • 🐳 Docker: Required for container-based execution.

  • πŸ’» Hardware: Intel/AMD64 architecture CPU with NVIDIA GPU, running Linux with NVIDIA drivers.

Installation

You can install the library using one of the following methods:

Option 2: Install from Source (For Development)

If you plan to contribute or debug the library, install it from source:

  1. Clone the repository:

    git clone https://github.com/chanzuckerberg/cz-benchmarks.git
    cd cz-benchmarks
    
  2. Install the package:

    pip install .
    
  3. For development, install in editable mode with development dependencies:

    pip install -e ".[dev]"
    

Running Benchmarks

You can run benchmarks using the CLI or programmatically in Python.

πŸ’» Using the CLI

The CLI simplifies running benchmarks. Below are common commands:

πŸ” List Available Benchmark Assets

czbenchmarks list models
czbenchmarks list datasets
czbenchmarks list tasks

πŸƒ Run a Benchmark

czbenchmarks run \
  --models SCVI \
  --datasets tsv2_bladder \
  --tasks clustering \
  --label-key cell_type \
  --output-file results.json

πŸ”§ CLI Run Options

Below are the key options available for running benchmarks via the CLI:

  • --models: Specifies the model to use (e.g., SCVI).

  • --datasets: Specifies the dataset to benchmark (e.g., tsv2_bladder).

  • --tasks: Defines the evaluation task(s) to execute (e.g., clustering).

  • --label-key: The metadata key to use as labels for the task (e.g., cell_type).

  • --output-file: File path to save the benchmark results (e.g., results.json).

πŸ’‘ Tip: Combine these options to customize your benchmark runs effectively.

πŸ“ Output: Results will be saved to results.json.

πŸ“– Get Help

Use the --help flag to explore available commands and options:

czbenchmarks --help
czbenchmarks <command> --help

🐍 Using the Python API

The library can also be used programmatically. Here’s an example:

from czbenchmarks.datasets.utils import load_dataset
from czbenchmarks.runner import run_inference
from czbenchmarks.tasks import ClusteringTask

# Load a dataset
dataset = load_dataset("tsv2_bladder")

# Run inference using the SCVI model
dataset = run_inference("SCVI", dataset)

# Perform clustering on the dataset
clustering = ClusteringTask(label_key="cell_type")
results = clustering.run(dataset)

# Print the clustering results
print(results)

Next Steps

Explore the following resources to deepen your understanding:

Happy benchmarking! πŸš€