Quick Start Guide

Welcome to cz-benchmarks! This guide will help you get started with installation, setup, and running your first benchmark in just a few steps.

Requirements

Before you begin, ensure you have the following installed:

🐍 Python 3.10+: 3.10+**: Ensure you have Python 3.10 or later installed.
🐳 Docker: Required for container-based execution.
💻 Hardware: Intel/AMD64 architecture CPU with NVIDIA GPU, running Linux with NVIDIA drivers.

Installation

You can install the library using one of the following methods:

Option 1: Install from PyPI (Recommended)

The easiest way to install the library is via PyPI:

pip install cz-benchmarks

Option 2: Install from Source (For Development)

If you plan to contribute or debug the library, install it from source:

Clone the repository:

git clone https://github.com/chanzuckerberg/cz-benchmarks.git
cd cz-benchmarks

Install the package:
```
pip install .
```
For development, install in editable mode with development dependencies:
```
pip install -e ".[dev]"
```

Running Benchmarks

You can run benchmarks using the CLI or programmatically in Python.

💻 Using the CLI

The CLI simplifies running benchmarks. Below are common commands:

🔍 List Available Benchmark Assets

czbenchmarks list models
czbenchmarks list datasets
czbenchmarks list tasks

🏃 Run a Benchmark

czbenchmarks run \
  --models SCVI \
  --datasets tsv2_bladder \
  --tasks clustering \
  --label-key cell_type \
  --output-file results.json

🔧 CLI Run Options

Below are the key options available for running benchmarks via the CLI:

--models: Specifies the model to use (e.g., SCVI).
--datasets: Specifies the dataset to benchmark (e.g., tsv2_bladder).
--tasks: Defines the evaluation task(s) to execute (e.g., clustering).
--label-key: The metadata key to use as labels for the task (e.g., cell_type).
--output-file: File path to save the benchmark results (e.g., results.json).

💡 Tip: Combine these options to customize your benchmark runs effectively.

📁 Output: Results will be saved to results.json.

📖 Get Help

Use the --help flag to explore available commands and options:

czbenchmarks --help
czbenchmarks <command> --help

🐍 Using the Python API

The library can also be used programmatically. Here’s an example:

from czbenchmarks.datasets.utils import load_dataset
from czbenchmarks.runner import run_inference
from czbenchmarks.tasks import ClusteringTask

# Load a dataset
dataset = load_dataset("tsv2_bladder")

# Run inference using the SCVI model
dataset = run_inference("SCVI", dataset)

# Perform clustering on the dataset
clustering = ClusteringTask(label_key="cell_type")
results = clustering.run(dataset)

# Print the clustering results
print(results)

Next Steps

Explore the following resources to deepen your understanding:

How-to Guides: Practical guides for using and extending the library.
Setup Guides: Setup Guides
Developer Docs: Internal structure and extension points.
GitHub Repository: cz-benchmarks for troubleshooting and support.

Happy benchmarking! 🚀