Data Import Guide¶
octopi leverages copick to provide a flexible and unified interface for accessing tomographic data, whether it's stored locally or remotely on a HPC server or on our CryoET Data Portal. This guide explains how to work with both data sources. If you need help creating these configuration files, detailed tutorials are available:
- Copick Quickstart - Basic configuration and setup
- Data Portal Tutorial - Working with CryoET Data Portal
Data Resolution¶
Before importing data, it's important to consider the resolution. We recommend working with tomograms at a voxel size of 10 Γ (1 nm) for optimal performance. You can downsample higher-resolution tomograms during import.
π‘ Example Copick Config File (config.json)
The copick configuration file points to a directory that stores all the tomograms, coordinates, and segmentations in an overlay root. The config files define all the pickable objects that octopi reads to determine target segmentations and converting predicted segmentation masks to object coordinates.{
"name": "test",
"description": "A test project description.",
"version": "1.0.0",
"pickable_objects": [
{
"name": "ribosome",
"is_particle": true,
"pdb_id": "7P6Z",
"label": 1,
"color": [0, 255, 0, 255],
"radius": 150,
"map_threshold": 0.037
},
{
"name": "membrane",
"is_particle": false,
"label": 2,
"color": [0, 0, 0, 255]
}
],
// Change this path to the location of sample_project
"overlay_root": "local:///PATH/TO/EXTRACTED/PROJECT/",
"overlay_fs_args": {
"auto_mkdir": true
}
}
Importing Local MRC Tomograms¶
If you have tomograms stored locally in *.mrc
format (e.g., from Warp, IMOD, or AreTomo), you can import them into a copick project:
octopi import-mrc-volumes \
--input-folder /path/to/mrc/files \
--config /path/to/config.json \
--target-tomo-type denoised \
--input-voxel-size 5 \
--output-voxel-size 10
To satisfy the recommended resolution requirement, we can downsample tomograms to the desired voxel size by specifying both the original voxel size (--input-voxel-size
) and the desired voxel size (--output-voxel-size
). In cases where downsampling is unnecessary, simply omit the --output-voxel-size
parameter.
Parameter Descriptions¶
Parameter | Description | Example |
---|---|---|
--mrcs-path | Path to directory containing MRC files | /data/tomograms/ |
--config | Path to copick config file | /project/config.json |
--target-tomo-type | Name for the tomogram type in your copick project | denoised , wbp , raw |
--input-voxel-size | Voxel size of your input MRC files (in Γ ngstrΓΆms) | 5 (for 5Γ
data) |
--output-voxel-size | (Optional) Target voxel size after downsampling | 10 (downsample to 10Γ
) |
Downloading from the CryoET Data-Portal¶
The CryoET Data Portal provides access to thousands of annotated tomograms. Octopi can work with this data in two ways:
1. Direct Portal Access¶
You can train models directly using data from the portal without downloading:
2. Local Download and Processing¶
For larger datasets or when running multiple experiments, it is recommended to download the data first:
octopi download-dataportal \
--config /path/to/config.json \
--datasetID 10445 \
--overlay-path /path/to/saved/zarrs \
--input-voxel-size 5 --output-voxel-size 10 \
--dataportal-name wbp --target-tomo-type wbp
Similar to local MRC import, you can downsample portal data by specifying both --input-voxel-size
and --output-voxel-size
parameters. To find available tomogram names for a dataset available on the portal, use:
This will save these tomograms locally under the --target-tomo-type
flag.
Next Steps¶
Once your data is imported, you can:
- Try the Quick Start - Complete end-to-end workflow example
- Prepare Training Data - Set up your particle annotations
- Start Training Models - Train custom 3D U-Net models
- Run Inference - Apply trained models to new data