Command Line Interface

vcp

VCP CLI - A command-line interface (CLI) to the Chan Zuckerberg Initiative’s Virtual Cell Platform (VCP)

vcp [OPTIONS] COMMAND [ARGS]...

benchmarks

View and run benchmarks available on the Virtual Cell Platform

vcp benchmarks [OPTIONS] COMMAND [ARGS]...

get

Fetch and display benchmark results from the API.

Use filters to select by model, dataset, or task. Choose to show model, baseline, or both metrics.

vcp benchmarks get [OPTIONS]

Options

-b, --benchmark-key <benchmark_key>

Filter by benchmark key (substring match with ‘*’ wildcards, e.g. ‘scvi*v1-tsv2*liver-label*pred’)

-m, --model-filter <model_filter>

Filter by model key (substring match with ‘*’ wildcards, e.g. ‘scvi*v1’)

-d, --dataset-filter <dataset_filter>

Filter by dataset key (substring match with ‘*’ wildcards, e.g ‘tsv2*liver’)

-t, --task-filter <task_filter>

Filter by task key (substring match with ‘*’ wildcards, e.g. ‘label*pred’)

-f, --format <format>

Output format

Options:

table | json

list

List available model, dataset and task benchmark combinations. You can filter results by dataset, model, or task using glob patterns.

vcp benchmarks list [OPTIONS]

Options

-b, --benchmark-key <benchmark_key>

Filter by benchmark key

-m, --model-filter <model_filter>

Filter by model key (substring match with ‘*’ wildcards, e.g. ‘scvi*v1’).

-d, --dataset-filter <dataset_filter>

Filter by dataset key (substring match with ‘*’ wildcards, e.g. ‘tsv2*liver’).

-t, --task-filter <task_filter>

Filter by task key (substring match with ‘*’ wildcards, e.g. ‘label*pred’).

-f, --format <format>

Output format

Options:

table | json

run

Run a benchmark task on a model and dataset.

Use a VCP model (–model-key) or a precomputed cell representation (–cell-representation). Datasets can be VCP datasets (–dataset-key) or user datasets (–user-dataset).

vcp benchmarks run [OPTIONS]

Options

-b, --benchmark-key <benchmark_key>

Shortcut for specifying model, dataset, and task together. Format: MODEL-DATASET-TASK (e.g., scvi_homo_sapiens-tsv2_blood-cell_type_annotation).

-m, --model-key <model_key>

Model key from the registry (e.g., scvi_homo_sapiens).

-d, --dataset-key <dataset_key>

Dataset key from czbenchmarks datasets(e.g., tsv2_blood). Can be used multiple times.

-u, --user-dataset <user_dataset>

Path to a user-provided .h5ad file. Provide as a JSON string with keys: ‘dataset_class’, ‘organism’, and ‘path’. Example: ‘{“dataset_class”: “czbenchmarks.datasets.SingleCellLabeledDataset”, “organism”: “HUMAN”, “path”: “~/mydata.h5ad”}’. Can be used multiple times.

-t, --task-key <task_key>

Benchmark task to run (choose from available tasks).

-c, --cell-representation <cell_representation>

Path to precomputed cell embeddings (.npy file) or AnnData reference (e.g., ‘@X’, @obsm:X_pca’). Can be used multiple times.

-l, --baseline-args <baseline_args>

JSON string with parameters for the baseline computation.

-r, --random-seed <random_seed>

Set a random seed for reproducibility.

-n, --no-cache

Disable caching. Forces all steps to run from scratch.

--labels <labels>

The .obs column with cell type labels. Supports both column name (‘cell_type’) and reference (@obs:cell_type’) formats (default: ‘cell_type’).

--use-rep <use_rep>

[Clustering Task] Representation to use for clustering (default: ‘X’).

--n-iterations <n_iterations>

[Clustering Task] Number of Leiden algorithm iterations (default: 2).

--flavor <flavor>

[Clustering Task] Flavor of Leiden algorithm (default: ‘igraph’).

Options:

leidenalg | igraph

--key-added <key_added>

[Clustering Task] Key for storing cluster assignments (default: ‘leiden’).

--n-folds <n_folds>

[Label Prediction Task] Number of cross-validation folds (default: 5).

--min-class-size <min_class_size>

[Label Prediction Task] Minimum samples per class for inclusion (default: 10).

--batch-column, --batch-labels <batch_column>

[Batch Integration Task] The .obs column with batch information (default: ‘batch’).

--cross-species-organisms <cross_species_organisms>

[Cross-Species] Organism name (e.g., ‘homo_sapiens:ENSG’). Repeat for each dataset in order.

--cross-species-labels <cross_species_labels>

[Cross-Species] Cell type labels column for each dataset. Supports both column name (‘cell_type’) and reference (@obs:cell_type’) formats. Repeat for each dataset in order.

cache

Manage vcp-cli cache and upload state.

The cache is used for embeddings, benchmark results, uploads (to allow resuming upon interruption), and new version checks

Examples:
vcp cache info # Show cache information
vcp cache clear –type uploads # Clear upload states
vcp cache history –limit 5 # Show last 5 uploads
vcp cache clean-uploads -m model1 -v 1.0 # Clean specific upload
vcp cache [OPTIONS] COMMAND [ARGS]...

clean-uploads

Clean specific upload states from cache.

vcp cache clean-uploads [OPTIONS]

Options

-m, --model <model>

Clean uploads for specific model

-v, --version <version>

Clean uploads for specific version

-f, --force

Skip confirmation prompt

clear

Clear cache data.

vcp cache clear [OPTIONS]

Options

-t, --type <type>

Type of cache to clear

Options:

all | uploads | benchmarks | version

-f, --force

Skip confirmation prompt

history

Show upload history from cache.

vcp cache history [OPTIONS]

Options

-l, --limit <limit>

Number of recent uploads to show

-v, --verbose

Show detailed information

info

Show cache information and statistics.

vcp cache info [OPTIONS]

config

Print the current configuration.

vcp config [OPTIONS]

Options

-c, --config <config>

Path to config file

-f, --format <format>

Output format

Options:

yaml | json

--show-secrets, --hide-secrets

Show sensitive information

-i, --init

Initialize a configuration if one does not exist

data

Data-related commands

vcp data [OPTIONS] COMMAND [ARGS]...

Data usage workflow

  1. List searchable fields: vcp data search –help

  2. Survey datasets based on searchable field: vcp data summary $FIELD

  3. Search for datasets: vcp data search ‘$FIELD:$VALUE’

3.a … Restrict search: vcp data search ‘$FIELD:$VALUE_1 AND $FIELD:$VALUE_2’

3.b … Expand search: vcp data search ‘($FIELD:$VALUE_1 OR $FIELD:$VALUE_3 ) AND ($FIELD:$VALUE_3)’

4 Describe a single dataset: vcp data describe $DATASET_ID

5 Download:

5.a … a single dataset: vcp data download $DATASET_ID

5.b … multiple datasets matching search: vcp data search $QUERY –download

credentials

Get the credentials for a specific dataset by id. If you do not know the id, first use the search command to find the id.

vcp data credentials [OPTIONS] DATASET_ID

Arguments

DATASET_ID

Required argument

describe

Describe a dataset with comprehensive metadata in tabular format.

Displays: • Basic Information (name, version, license, owner, DOI) • Biological Metadata (assays, organisms, tissues, diseases, etc.) • Distribution/Assets (download locations and formats)

vcp data describe [OPTIONS] DATASET_ID

Options

--full

Show the complete DatasetRecord as pretty-printed JSON and exit.

--raw

Show the raw returned record.

Arguments

DATASET_ID

Required argument

download

Download dataset(s) by ID or search query. At least one of –id or –query is required.

vcp data download [OPTIONS]

Options

--id <dataset_id>

Dataset ID to download a single dataset

-q, --query <query>

Search query to download multiple datasets

-o, --outdir <outdir>

Directory to write the files.

-e, --exact

Use exact match when passing –query

Examples:

  • Download a SINGLE dataset by its exact ID

    vcp data download –id $DATASET_ID

  • Download MULTIPLE datasets matching a search query

    vcp data download –query $QUERY

    … equivalent to vcp data search $QUERY –download

metadata-list

List available metadata fields for searching datasets.

vcp data metadata-list [OPTIONS]

preview

Generate a Neuroglancer preview URL for a dataset with zarr files.

DATASET_ID: The ID of the dataset to preview

vcp data preview [OPTIONS] DATASET_ID

Options

--open

Automatically open the preview URL in your browser

Arguments

DATASET_ID

Required argument

summary

Summarize counts of matched datasets against a specified FIELD.

vcp data summary [OPTIONS] {assay|assay_ontology_term_id|tissue|tissue_ontolog
                 y_term_id|organism|organism_ontology_term_id|disease|disease_
                 ontology_term_id|tissue_type|cell_type|development_stage|deve
                 lopment_stage_ontology_term_id}

Options

-q, --query <query>

Search query to filter datasets.

Arguments

FIELD

Required argument

Examples:

  • Summarize by a Cross Modality field

    vcp data summary assay

    vcp data summary assay_ontology_term_id

  • Filter Summary by an additional search query

    vcp data summary assay –query brain

login

Login to the Virtual Cell Platform

vcp login [OPTIONS]

Options

-c, --config <config>

Path to configuration file

-f, --force

Force login even if valid tokens exist

-v, --verbose

Enable verbose output

-u, --username <username>

Username for direct login

logout

Logout from the Virtual Cell Platform

vcp logout [OPTIONS]

Options

-c, --config <config>

Path to configuration file

-v, --verbose

Enable verbose output

model

Manage models in the Virtual Cell Platform.

Available commands:

• init - Initialize a new model project
• list - List available models
• download - Download a specific model version
• submit - Submit model metadata to the VCP Model Hub
• stage - Stage model data files to Contributions Store
• status - Query and display the status of all model submissions
• validate-metadata - Validate metadata for a model
• assist - Check model submission status and get step-by-step guidance
vcp model [OPTIONS] COMMAND [ARGS]...

assist

Check model submission status and get guidance for VCP model commands.

This command helps you understand where you are in the model submission process and what the next steps should be.

Examples:
• vcp model assist # Check current directory status
• vcp model assist –work-dir ./my-model # Check specific work directory
• vcp model assist –step init # Get init step guidance
• vcp model assist –step metadata # Get metadata step guidance
• vcp model assist –step stage # Get stage step guidance
• vcp model assist –step submit # Get submit step guidance
vcp model assist [OPTIONS]

Options

--work-dir <work_dir>

Path to the model repository work directory (defaults to current directory)

--step <step>

Get detailed guidance for a specific workflow step

Options:

init | status | metadata | weights | package | stage | submit

-c, --config <config>

Path to config file

download

Download a specific version of a model using presigned S3 URLs.

vcp model download [OPTIONS]

Options

--model <model>

Required Name of the model to download

--version <version>

Required Version of the model to download

--output <output>

Required Directory to save the downloaded model

-c, --config <config>

Path to config file

--timeout <timeout>

Timeout in seconds for download (default: 1800)

-v, --verbose

Show detailed debug information

--max-workers <max_workers>

Maximum number of concurrent downloads (default: 4)

init

Initialize a new model in the VCP Model Hub API.

Runs in interactive mode by default when no parameters are provided. Use –work-dir to specify where to initialize the model repository

vcp model init [OPTIONS]

Options

--model-name <model_name>

Name of the model to initialize.

--model-version <model_version>

Version of the model to initialize.

--license-type <license_type>

License type for the model (e.g., MIT, Apache-2.0).

--work-dir <work_dir>

Path to the model repository work directory where the model will be initialized.

--data-file <data_file>

JSON file containing template data to prefill answers.

--interactive

Run in interactive mode to prompt for required parameters (default when no parameters provided).

--workflow-help

Show workflow guidance and next steps after initialization.

-v, --verbose

Enable verbose output for debugging and detailed information.

--debug

Enable debug output (includes sensitive information - use with caution).

--debug-file <debug_file>

Write debug output to file (automatically masks sensitive data for safe sharing).

--iteration

Indicate this is an iteration on a previously submitted model.

--skip-git

Skip git operations (clone, pull, branch creation) for faster initialization.

list

List available models and their versions from the VCP Model Hub API.

vcp model list [OPTIONS]

Options

--format <format>

Output format (table or json)

Options:

table | json

-c, --config <config>

Path to config file

-v, --verbose

Show detailed debug information

stage

Stage model files for upload to the VCP Model Hub.

This command uploads files to S3 using the new model_data path structure:
- Files are uploaded to: s3://bucket/model/version/model_data/
- Creates .ptr pointer files with metadata for each uploaded file
- Deletes original files after successful upload
- Automatically resumes from previous failed upload attempts (use –no-resume to start fresh)
- Filters out hidden files (starting with .)
The upload process:
1. Scan directory and create initial pointer files
2. Get presigned URLs for each file
3. Upload files sequentially with immediate metadata updates
4. Delete original files after successful upload
Work Directory:
- Use –work-dir to specify the location of model configuration file
- If not provided, checks current directory first, then prompts for work directory
- The command will look for model_data directory within the work directory structure
Examples:
- vcp model stage –work-dir /path/to/model/repo
- vcp model stage –model my-model –version v1.0.0 –data-path ./model_data
- vcp model stage –work-dir /path/to/repo –verbose
vcp model stage [OPTIONS]

Options

--model <model>

Name of the model

--version <version>

Version of the model

--data-path <data_path>

Directory path containing model files

--work-dir <work_dir>

Path to the model repository work directory where model configuration is located

-c, --config <config>

Path to config file

-v, --verbose

Show detailed debug information

--max-retries <max_retries>

Maximum number of retry attempts per file (default: 3)

--no-resume

Start fresh without resuming from previous upload attempt

--clean-state

Clean previous upload state and start fresh

--batch-upload

[DEPRECATED] Batch upload is now the default behavior

--interactive

Run in interactive mode to prompt for required parameters.

status

Query and display the status of all model submissions from the VCP Model Hub.

This command shows the current status of all models that have been initialized or submitted to the VCP Model Hub.

Examples:
• vcp model status # Show all submissions in table format
• vcp model status –format json # Show all submissions in JSON format
• vcp model status –verbose # Show detailed debug information
vcp model status [OPTIONS]

Options

--format <format>

Output format (table or json)

Options:

table | json

-c, --config <config>

Path to config file

-v, --verbose

Show detailed debug information

--work-dir <work_dir>

Path to the model repository work directory where model configuration is located

submit

Submit model for review with comprehensive validation.

This command performs the following operations:
1. Validates that init command was run (model configuration file exists)
2. Validates that stage command was run (only .ptr files in model_data)
3. Validates that no large files (>5GB) remain
4. Submits model data to VCP Model Hub API
5. Creates submission for model review
The command reads submission data from:
- model_card_docs/model_card_metadata.yaml file in the repository
Expected model card metadata structure:
- model_display_name: Model name for submission
- model_version: Version (vX.X.X or YYYY-MM-DD format)
- licenses.name: License type
- repository_link: Model repository URL
- authors: List of authors with name field
- model_description: Detailed model description
Work Directory:
- Use –work-dir to specify the location of model configuration file
- If not provided, checks current directory first, then prompts for work directory
- The command will look for model_data directory within the work directory structure
Examples:
- vcp model submit –work-dir /path/to/model/repo
- vcp model submit –work-dir /path/to/repo –skip-git
- vcp model submit –work-dir /path/to/repo –verbose
vcp model submit [OPTIONS]

Options

-c, --config <config>

Path to config file

-v, --verbose

Show detailed debug information

--work-dir <work_dir>

Path to the model repository work directory where model configuration is located

--skip-git

Skip submission operations

validate-model-metadata

Validates model metadata files against requirements in vcp-model-hub.

vcp model validate-model-metadata [OPTIONS]

Options

--yaml-metadata-file <yaml_metadata_file>

Path to the YAML metadata file (default: model_card_metadata.yaml)

--markdown-details-file <markdown_details_file>

Path to the Markdown details file (default: model_card_details.md)

-c, --config <config>

Path to config file

-v, --verbose

Show detailed debug information

version

Show version information and optionally check for updates.

vcp version [OPTIONS]

Options

--check

Check for available updates on PyPI