CryoET Data Portal | Documentation

The Chan Zuckerberg Institute for Advanced Biological Imaging (CZ Imaging Institute) has made a beta release of the CryoET Data Portal providing queryable and organized data from CryoET experiments. Each of the nearly 20,000 tomograms on the Portal have at least one structure annotated.

Learn about the Data Schema

This site provides additional documentation for using our Python API to query and download data and for navigating the CryoET Data Portal and its visualization tools. We hope this site will assist segmentation algorithm developers to produce annotations for diverse macromolecules in the tomograms that may be used for high-resolution subtomogram averaging.

We welcome feedback from the community on the data structure, design and functionality.

  • Share first impressions, or sign up for invites to future feedback activities in this short form.

  • Submit bugs for the CryoET Data Portal via GitHub issues.

  • Start a Github discussion with questions or to request new features.

Getting Started

Get Started

Install and start using the Python Client API

Quickstart
API Reference

Information on the Python Client API Classes

API reference
Tutorials

Examples of selecting, downloading, and visualizing data from the Portal

Tutorials
About CryoET Data Portal

Learn about CryoET data and how to find and preview it on the Portal

About CryoET Data Portal

Amazon Web Services S3 Bucket Info

The CryoET Data Portal S3 bucket supports public access. The bucket URL is:

s3://cryoet-data-portal-public

To list the bucket contents with the S3 CLI without credentials, please use the following:

aws s3 ls --no-sign-request s3://cryoet-data-portal-public

Refer to this how-to guide for information on downloading data from our AWS S3 bucket.

Citing the CryoET Data Portal

Portal Citation

If you use the CryoET Data Portal in your work, please cite the following publication:

Ermel, U., Cheng, A., Ni, J.X. et al. A data portal for providing standardized annotations for cryo-electron tomography. Nat Methods 21, 2200–2202 (2024). https://doi.org/10.1038/s41592-024-02477-2

Acknowledging Data Contributors

If you use data from the Portal in your work, please acknowledge the authors and cite associated publications. Below is an example of recommended formatting:

Some of the data used in this work was provided by Irene de Teresa Trueba et al and Mallak Ali et al. The data are available through the CryoET Data Portal (Nat Methods 21, 2200–2202 (2024). https://doi.org/10.1038/s41592-024-02477-2) with the following metadata.

Deposition ID

Entity Type

Entity ID(s)

Primary Author(s)

Associated Publication DOI(s)

10000

Dataset

10000, 10001

Irene de Teresa Trueba

10.1101/2022.04.12.488077, 10.1038/s41592-022-01746-2

10312

Dataset

10442

Mallak Ali, Ariana Peck, Yue Yu, Jonathan Schwartz

None

Finding Citation Metadata via the API or GraphQL

You can programmatically retrieve metadata for your citations via the Python API.

API Example - Dataset, Annotation, or Tomogram

from cryoet_data_portal import Client, Dataset
client = Client()
dataset = Dataset.get_by_id(client, 10442)
print(dataset.deposition_id)  # Get the deposition ID
print(dataset.dataset_publications)   # List of DOIs of associated publications

Output:

10312
None

API Example - Runs

For runs, access the deposition ID and publication DOIs through the parent dataset:

from cryoet_data_portal import Client, Run
client = Client()

run = Run.get_by_id(client, 10005) # use the numeric Run ID
print(run.dataset.deposition_id)  # Get the deposition ID
print(run.dataset.dataset_publications)   # List of DOIs of associated publications

Output:

10029
10.1073/pnas.1518952113

GraphQL Example - Dataset, Annotation or Tomogram

query GetDatasetPublication {
  datasets(where: { id: { _eq: 10442 } }) {
    id
    depositionId
    datasetPublications
  }
}

Output:

{
  "data": {
    "datasets": [
      {
        "id": 10442,
        "depositionId": 10312,
        "datasetPublications": null
      }
    ]
  }
}

GraphQL Example - Runs

For runs, access the deposition ID and publication DOIs through the parent dataset:

query GetRunPublication{
  runs(where: { id: { _eq: 10005 } }) {
    id
    dataset {
      depositionId
      datasetPublications
    }
  }
}

Output:

{
  "data": {
    "runs": [
      {
        "id": 10005,
        "dataset": {
          "depositionId": 10029,
          "datasetPublications": [
            "10.1073/pnas.1518952113"
          ]
        }
      }
    ]
  }
}

Note

Segmentation experts and developers are also encouraged to get in touch with the data providers if they feel they have developed a useful tool that might help to process the entirety of the datasets (which are much larger than the subsets provided for the portal) more efficiently or effectively.