Roadmapο
π Welcome to the public roadmap for the cz-benchmarks package β a collaborative effort to enable the scientific community to conduct reproducible, biologically-relevant benchmarking of AI models for biology, across domains.
The roadmap reflects our initial priorities and sequencing of work. Our goal in sharing this roadmap is to enable community engagement and contribution through transparency; without this, we know that our work here cannot be successful.
Please note that this is an early stage project and as such, we expect the roadmap to change depending on what is learned through code development, community engagement, and internal priorities. Changes in roadmap are subject to the current governing principles of the team (see below).
π Priority User Needsο
Can I load data, run, and reproduce a benchmark?
Can I use the benchmarks in my model developer workflow?
How can I, as a member of the community, contribute benchmarks?
(internal, exploratory) Can we build a benchmarking package that spans multiple modalities/domains required for Virtual Cell modeling?
π― Launched & Live: Core Infrastructure & Reproducibilityο
The cz-benchmarks repository is open sourced and available as a PyPi package. π With this first major release, weβveβ¦
Focused cz-benchmarksβ scope on benchmarking tasks and metrics. As the first step in this work, we pulled out individual models from being integrated directly within cz-benchmarks. This separation of models from datasets, tasks & metrics enables flexible usage of the cz-benchmarks tasks and datasets across various model configurations, ensures that tasks and metrics stay open-sourced and community-driven, and follows industry practice seen in other benchmark tools.
CZIβs Virtual Cell Platform is already a natural home for curated models & datasets. As such, weβve added complementary support in the VCP CLI to support running, viewing, and reproducing our public benchmarking results - powered by cz-benchmarks and the core resources on the Virtual Cell Platform.
By decoupling VCP from cz-benchmarks, model developers can utilize these tasks and benchmarks to evaluate their own models during development.
Published six foundational tasks that reflect common needs in single-cell biology:
Cell clustering
Cell type classification
Cross-species integration
Cross-species disease label transfer (contributed by a community working group)
Perturbation modeling (contributed by NVIDIA)
Sequential ordering (contributed by the Allen Institute)
Published six cz-benchmarks compatible benchmark datasets that support evaluations across the above listed tasks
Published tutorial notebooks and developer workflow examples
π¬ Initial domain focus: single-cell transcriptomics
π Next: Support Developer Workflow & Early Contribution Workflowο
In Developmentο
Incorporate feedback on cz-benchmarks from community use
Following up on the work to pull models out of cz-benchmarks and have them be compatible with the package, weβll be following the same pattern with benchmark datasets. At the end of this work, the goal is to have cz-benchmarks stay focused on metrics & tasks, while being complementary with the VCP curated and hosted models & datasets.
Expand suite of community-driven assets to incorporate additional working group recommendations and NVIDIA
Expand suite of transcriptomic perturbation benchmarks
π¬ Explore expanding domain focus to: Imaging (specifically, image representation learning, virtual staining, and dynamic imaging) and DNA models. If youβre interested in contributing to early stage work in these areas, please reach out to us at virtualcellmodels@chanzuckerberg.com.
π Future Ideasο
Refine contribution workflow for tasks and metrics into cz-benchmarks
NVIDIA tutorials to assist developers in leveraging cz-benchmarks in their workflows
Benchmarking on held-out datasets via hosted inference
Roadmap Governanceο
Our goal is to work towards a community-developed benchmarking resource that will be useful for the scientific community. In the short term, to get an alpha release initiated and stable, we currently operate using a simple governance model as follows:
Roadmap Leads: Katrina Kalantar, Olivia Holmes (CZI) and Laksshman Sundaram, TJ Chen (NVIDIA)
Tech Leads: Sanchit Gupta, Andrew Tolopko (CZI) and Ankit Sethia, Michelle Gill (NVIDIA)
Roadmap alignment and decision making are completed by the Roadmap Leads, in close collaboration with the Tech Leads, by consensus wherever possible. CZI SciTech currently maintains ownership of the repository and holds final decision-making authority. We will be working in close collaboration with NVIDIA to execute the roadmap and will continue to evolve the governance structure based on project needs, community growth, and resourcing. Guidelines and governance for included assets are available here.