Model

The VCP CLI model command on allows you to interact with models previously submitted to the Virtual Cells Platform.

In this page, we provide instructions on how to use the list and download models using the VCP CLI tool. In addition, we describe how to run downloaded models locally using MLflow.

Why Download Models?

You can use downloaded models to run inference on multiple datasets, inspect model behavior under different parameters, and obtain raw inference outputs for downstream analyses. This enables you to run models on your own compute and test models beyond pre-defined tasks and datasets provided through the cz-benchmarks package.

Getting Started

Prerequisites

To use the VCP CLI model commands, ensure you have the following prerequisites installed:

Python version >= 3.10, <= 3.13
The VCP CLI tool. See Installation for instructions.

If you plan on running downloaded models locally, we recommend installing uv to manage your virtual environment. You will also need to install MLflow to run model inference. In addition, you may need to review the requirements for the model(s) of interest, including hardware and software dependencies.

See Run Downloaded Models for more details.

Get Help Using the CLI

The --help option provides additional documentation and tips. You can add it to the end of any of the available commands for more information.

For example, to learn what model commands are available for this tool, run:

vcp model --help

The VCP CLI has 2 core model commands:

Command	Description
`vcp model list`	List all download-enabled models with their versions and variants.
`vcp model download`	Download a specific model version and variant to your local filesystem.

Note

You do not need to be logged in to list or download models.

List Models

List all available download-enabled models, displaying their names, versions, and available variants.

Basic Usage

vcp model list

Options

Option	Description	Default
`--format`	Output format: `table` or `json`	`table`

Examples

List models in table format:

vcp model list

Output models as JSON:

vcp model list --format json

Output

The table output displays:

Model Name: The identifier for the model
Version: Available versions (e.g., v1, v2, 2024-01-15)
Variants: Available variants for multi-variant models (e.g., organism-specific versions like homo_sapiens, mus_musculus)

Download Models

Download a specific model version and variant to your local filesystem.

Basic Usage

vcp model download --model <MODEL_NAME> --version <MODEL_VERSION>

Required Options

Option	Description
`--model`	Name of the model to download
`--version`	Version of the model to download

Optional Options

Option	Description	Default
`--output`	Directory to save the downloaded model	Current working directory
`--variant`	Variant name (e.g., `homo_sapiens`, `mus_musculus`)	Auto-selected if only one variant available

Examples

Download a single-variant model:

vcp model download --model my-model --version v1 --output ./models

Download with specific variant:

vcp model download --model my-model --version v1 --variant homo_sapiens --output ./models

Understanding Variants

Some models are available in multiple variants (e.g., organism-specific versions).

If a model has only one variant, it will be selected automatically. You’ll see a message about the auto-selected variant. For example:
```
✓ Auto-selected variant: homo_sapiens
```

If multiple variants are available and --variant is not specified, you’ll see a helpful panel listing available variants. Example output:

╭─── Variant Selection Required ─────╮
│ Multiple variants available for    │
│ my-model v1                        │
│                                    │
│ Available variants:                │
│   • homo_sapiens                   │
│   • mus_musculus                   │
│                                    │
│ Please specify a variant:          │
│   vcp model download my-model v1   │
│   --variant <variant_name>         │
╰────────────────────────────────────╯

Output Structure

Downloaded models are saved to a directory with the following naming pattern:

Single variant: {model}-{version}/
Multi-variant: {model}-{version}-{variant}/

Example: For a model called my-model, version v1, and variant homo_sapiens, the output directory structure will be:

./models/
  └── my-model-v1-homo_sapiens/
      ├── model.tar.gz
      └── metadata.yaml

Run Downloaded Models

To run downloaded models in your local system, you will need to:

review system requirements for model(s) of interest, including hardware and software dependencies
install MLflow
prepare an input JSON file

Below we provide details for each step, including commands to run model inference. In addition, we demonstrate how to run inference using a TranscripFormer model variant.

Review Model Requirements

Find hardware and software requirements for each model within the corresponding model card or GitHub repository to ensure the availability of compute resources. For example, some models may require GPUs with specific capabilities (e.g., CUDA version) or a minimum amount of RAM. In addition, you may need to install additional software packages or libraries depending on the model.

Install MLflow

To use a downloaded model, create a virtual environment and install MLflow.

Tip

We recommend using uv to manage your virtual environment. This will ensure reproducibility while quickly managing complex dependencies.

Note that you can simply activate your virtual environment using source vcp-cli/bin/activate if you installed the VCP CLI using pip install 'vcp-cli[all]'.

# create virtual environment and install MLFlow
uv venv
uv pip install mlflow

Generate Input JSON File

Each model run requires an input JSON file that specifies necessary parameters and a path to your dataset.

There are two options to create the input JSON file:

Option 1: After downloading a model, you will find a serving_input_example.json file within the model folder that includes the parameters necessary to run the model. Edit the serving_input_example.json file directly to specify the dataset path under data and, if needed, adjust the default parameters.
Option 2: Create the input JSON file programmatically. You can start with the example below, making sure to edit the data and params fields to specify dataset path and relevant parameters to your model, respectively.

# create and import input JSON file
input_json = {
  "dataframe_split": {
    "columns": [
      "input_uri"
    ],
    "data": [
      [
        f"{path_to_dataset}/filename.h5ad"
      ]
    ]
  },
  "params": {
    "batch_size": 32,
    "precision": "16-mixed",
    "gene_col_name": "ensembl_id"
  }
}

import json

with open("input.json", "w") as f:
  json.dump(input_json, f)

The examples below show input JSON files specifying parameters for running TranscriptFormer and scVI. Note the difference under the params field.

TranscriptFormer Input JSON File

{
  "dataframe_split": {
    "columns": [
      "input_uri"
    ],
    "data": [
      [
        f"{path_to_dataset}/filename.h5ad"
      ]
    ]
  },
  "params": {
    "batch_size": 32,
    "precision": "16-mixed",
    "gene_col_name": "ensembl_id"
  }
}

scVI Input JSON File

 {
  "dataframe_split": {
    "columns": [
      "input_uri"
    ],
    "data": [
      [
        f"{path_to_dataset}/filename.h5ad"
      ]
    ]
  },
  "params": {
    "organism": "human",
    "return_dist": true
  }
}

Run Model Inference

Run model inference using the following command:

# Specify model download directory, output filename (JSON), and environment manager
mlflow models predict \
--model-uri <path_to_model_download_folder> \
--content-type json \
--input-path input.json \
--output-path <test_output.json> \
--env-manager <virtualenv-manager>

You can check the MLFlow documentation for more details about the mlflow models predict command. Note that the --env-manager flag supports virtualenv, conda, local and uv, and we recommend using uv.

Example workflow

If you are using uv/pip as recommended, the workflow to run model inference would include the following steps:

Step 1: Download model

vcp model download --model my-model --version v1 --output ./models

Step 2: Create input JSON file (named here ‘input.json’)

input_json = {
  "dataframe_split": {
    "columns": [
      "input_uri"
    ],
    "data": [
      [
        f"{path_to_dataset}/filename.h5ad"
      ]
    ]
  },
  "params": {
    "batch_size": 32,
    "precision": "16-mixed",
    "gene_col_name": "ensembl_id"
  }
}

import json

with open("input.json", "w") as f:
  json.dump(input_json, f)

Step 3: Run model inference

mlflow models predict \
--model-uri ./models/my-model-v1 \
--content-type json \
--input-path input.json \
--output-path <test_output.json> \
--env-manager uv

Note

If you encounter issues while running inference, please refer to the appropriate model card for contact information.

Example: Running Inference with TranscriptFormer

In this example we list steps to run model inference with TranscriptFormer using uv and pip. We will download the TF-sapiens model variant to generate cell embeddings using human data. For more complex examples of what can be done using TranscriptFormer, see the quickstart and GitHub repository.

Step 1: Download model

TranscriptFormer has multiple variants depending on the type of data used for training. For this example, we will run inference on human data using the tf-sapiens variant.

# Download tf-sapiens model variant
vcp model download --model transcriptformer --version v0.6.0 --variant tf_sapiens --output <path_to_model_download_folder>

# Set directory to downloaded `transcriptformer-v0.6.0-tf_sapiens` folder

Step 2: Download dataset

In this example, we used human lung data from the Tabula Sapiens dataset. Click here to download the dataset. We saved the dataset in the transcriptformer-v0.6.0-tf_sapiens folder and named it TS_lung.h5ad.

Step 3: Create input JSON file

The following input.json file contains the necessary default parameters for running the TranscriptFormer model. The parameters include batch_size, precision, and gene_col_name variables. Note that we included the path to the dataset under the data field.

input_json = {
  "dataframe_split": {
    "columns": [
      "input_uri"
    ],
    "data": [
      [
        f"./TS_lung.h5ad"
      ]
    ]
  },
  "params": {
    "batch_size": 32,
    "precision": "16-mixed",
    "gene_col_name": "ensembl_id"
  }
}

import json

with open("input.json", "w") as f:
  json.dump(input_json, f)

Step 4: Run inference

Use the following command to run inference and obtain cell embeddings:

mlflow models predict \
--model-uri . \
--content-type json \
--input-path input.json \
--output-path tf-sapiens-lung_output.json \
--env-manager uv

For more information about the capabilities of TranscriptFormer, see the Github documentation here.