CryoET Data Portal | Documentation

The Chan Zuckerberg Institute for Advanced Biological Imaging (CZ Imaging Institute) has made a beta release of the CryoET Data Portal providing queryable and organized data from CryoET experiments. Each of the over 10,000 tomograms on the Portal have a minimum of “ground truth” point annotations of ribosomes.

This site provides additional documentation for accessing the CryoET Data Portal through its Python API to query and download the data. The initial target users are segmentation algorithm developers to produce annotations for diverse macromolecules in the tomogram that may be used for high-resolution subtomogram averaging.

We welcome feedback from the community on the data structure, design and functionality. Share first impressions, or sign up for invites to future feedback activities in this short form. Submit questions, bugs, and feature requests for the CryoET Data Portal via GitHub issues.

Getting Started

Amazon Web Services S3 Bucket Info

The CryoET Data Portal S3 bucket supports public access. The bucket URL is:

s3://cryoet-data-portal-public

To list the bucket contents with the S3 CLI without credentials, please use the following:

aws s3 ls --no-sign-request s3://cryoet-data-portal-public

CryoET Workflow Overview

CZII Graphic Github

Electron Tomography workflow and the data we provide.

A. Sample is rotated to different tilt angles and electrons pass through to produce projection images of the 3D volume

B. We provide raw movie frames collected by a direct detector and may also provide these stacked into a tilt series of images

C. A 3D tomographic reconstructed volume is produced by back projecting projections which are first corrected in a variety of ways (motion correction, CTF estimation, etc.)

D. We provide the 3D volume together with any available point annotations or semantic segmentations of macromolecular complexes for each volume

Citing the CryoET Data Portal

Data from the portal must acknowledge the data providers and the original publications. The following is provided as an example:

Some of the data used in this work was provided by the group(s) of Julia Mahamid (EMBL)/Jürgen Plitzko (MPI) [see beta site for current details]. The work is described more fully in the publication:

Provider

Julia Mahamid

Julia Mahamid

Jürgen Plitzko

Dataset name

10000

10001

10004

Acknowledgement

doi:10.1038/s41592-022-01746-2

doi:10.1038/s41592-022-01746-2

doi:10.1101/2023.04.28.538734

Note

Segmentation experts and developers are also encouraged to get in touch with the data providers if they feel they have developed a useful tool that might help to process the entirety of the datasets (which are much larger than the subsets provided for the portal) more efficiently or effectively.