Quick start

This page provides details to help you get started using the CryoET Data Portal Client API.

Contents

  1. Installation

  2. API Methods Overview

  3. Example Code Snippets

Installation

Requirements

The CryoET Data Portal Client requires a Linux or MacOS system with:

  • Python 3.7 to Python 3.11.

  • Recommended: >16 GB of memory.

  • Recommended: >5 Mbps internet connection.

  • Recommended: for increased performance, use the API through an AWS-EC2 instance from the region us-west-2. The CryoET Portal data are hosted in a AWS-S3 bucket in that region.

Install in a Virtual Environment

(Optional) In your working directory, make and activate a virtual environment or conda environment. For example:

python -m venv ./venv
source ./venv/bin/activate

Install the latest cryoet_data_portal package via pip:

pip install -U cryoet-data-portal

API Methods Overview

The Portal API has methods for searching and downloading data. Every class has a find and get_by_id method for selecting data, and some classes have download... methods for downloading the data. Below is a table of the API classes that have download methods.

Class

Download Methods

Annotation

download, download_metadata

Dataset

download_everything

Run

download_everything

TiltSeries

download_alignment_file, download_angle_list, download_mrcfile, download_omezarr

Tomogram

download_all_annotations, download_mrcfile, download_omezarr

TomogramVoxelSpacing

download_everything

The find method selects data based on user-chosen queries. These queries can have python operators ==, !=, >, >=, <, <=; method operators like, ilike, _in; and strings or numbers. The method operators are defined in the table below:

Method Operator

Definition

like

partial match, with the % character being a wildcard

ilike

case-insensitive partial match, with the % character being a wildcard

_in

accepts a list of values that are acceptable matches

The general format of using the find method is as follows:

data_of_interest = find(client, queries)

The get_by_id method allows you to select data using the ID found on the Portal. For example, to select the data for Dataset 10005 on the Portal and download it into your current directory use this snippet:

data_10005 = Dataset.get_by_id(client, 10005)
data_10005.download_everything()

Examples

Below are 3 examples of common operations you can do with the API. Check out the examples page for more code snippets or the tutorials page for longer examples.

Browse all data in the portal

To illustrate the relationships among the classes in the Portal, below is a loop that iterates over all datasets in the portal, then all runs per dataset, then all tomograms per run and outputs the name of each object.

Attention

This loop is impractical! It iterates over all data in the Portal. It is simply for demonstrative purposes and should not be included in efficient code.

from cryoet_data_portal import Client, Dataset

# Instantiate a client, using the data portal GraphQL API by default
client = Client()

# Iterate over all datasets
for dataset in Dataset.find(client):
    print(f"Dataset: {dataset.title}")
    for run in dataset.runs:
        print(f"  - run: {run.name}")
        for tomo in run.tomograms:
            print(f"    - tomo: {tomo.name}")

The output with the object names would display something like:

Dataset: S. pombe cells with defocus
  - run: TS_026
    - tomo: TS_026
...

Find all datasets containing membrane annotations

The below example uses the find method with a longer API expression in the query to select datasets that have membrane annotations and print the IDs of those datasets.

import cryoet_data_portal as portal

# Instantiate a client, using the data portal GraphQL API by default
client = portal.Client()

# Use the find method to select datasets that contain membrane annotations
datasets = portal.Dataset.find(client, [portal.Dataset.runs.annotations.object_name.ilike("%membrane%")])
for d in datasets:
   print(d.id)

Find all tomograms for a certain organism and download preview-sized MRC files

The following iterates over all tomograms related to a specific organism and downloads each tomogram in MRC format.

import json

from cryoet_data_portal import Client, Tomogram

# Instantiate a client, using the data portal GraphQL API by default
client = Client()

# Find all tomograms related to a specific organism
tomos = Tomogram.find(
    client,
    [
        Tomogram.run.dataset.organism_name == "Schizosaccharomyces pombe"
    ],
)

for tomo in tomos:
    # Access any useful metadata for each tomogram
    print(tomo.name)

    # Print the tomogram metadata as a json string
    print(json.dumps(tomo.to_dict(), indent=4))

    # Download a tomogram in the MRC format (uncomment to actually download files)
    # tomo.download_mrcfile()

Downloads display a progress bar by default:

TS_026
{
    "id": 121,
    "tomogram_voxel_spacing_id": 1,
    "name": "TS_026",
... more output ...