Importing Data
This guide covers importing tilt series and particle coordinates into py2rely-compatible STAR file formats.
Overview
py2rely requires two input files for sub-tomogram averaging:
| File | Contents | Source |
|---|---|---|
| Tilt Series STAR | Alignment parameters, CTF info | AreTomo output |
| Particles STAR | Particle coordinates, orientations | Copick, STAR files, PyTom |
π― Import Particle Coordinates
Import from coordinates from a Copick project:
py2rely prepare particles \
--config copick_config.json \
--session 24jan01 \
--name virus-like-particle \
-ps 1.54 -x 4096 -y 4096 -z 1200 \
-v 300 -sa 2.7 -ac 0.1
β
This will write a STAR file to input/24jan01_virus_like_particle.star
Coordinate System
Copick stores coordinates in physical units (Γ ngstroms), not pixels.
py2rely automatically converts to Relion format using:
- Tomogram unbinned dimensions (
-x,-y,-z) - Tilt-series Pixel size (
-ps)
π
prepare particles Parameters
| Parameter | Short | Description | Default |
|---|---|---|---|
--config |
-c |
Path to Copick config file | required |
--session |
-s |
Experiment session identifier | required |
--name |
-n |
Protein/particle name | required |
--output |
-o |
Output directory for STAR file | input |
--session-id |
-sid |
Copick session ID filter | - |
--user-id |
-uid |
Copick user ID filter | - |
--run-ids |
-rids |
Run IDs to filter (comma-separated) | - |
--voxel-size |
-vs |
Voxel size of picked tomograms (Γ ) | - |
--x |
-x |
Tomogram x-dimension (pixels) | 4096 |
--y |
-y |
Tomogram y-dimension (pixels) | 4096 |
--z |
-z |
Tomogram z-dimension (pixels) | 1200 |
--pixel-size |
-ps |
Tilt series pixel size (Γ ) | 1.54 |
--voltage |
-v |
Acceleration voltage (kV) | 300 |
--spherical-aberration |
-sa |
Cs value (mm) | 2.7 |
--amplitude-contrast |
-ac |
Amplitude contrast | 0.07 |
--optics-group |
-og |
Optics group number | 1 |
--optics-group-name |
-ogn |
Optics group name | opticsGroup1 |
--relion5 |
Use Relion5 centered coordinates | True |
Importing Coordinates into Copick
If you have particle coordinates generated from an external tool (e.g., a neural network, template matcher, Dynamo, EMAN2, etc.), you can programmatically write them into a Copick project using the Copick Python API.
from scipy.spatial.transform import Rotation as R
import copick, starfile
import numpy as np
# Load copick project
root = copick.from_file('config.json')
# Load starfile with coordinates
df = starfile.read('particles.star')
nPoints = df.shape[0]
cx, cy, cz = df['coordX'], df['coordY'], df['coordZ']
# (Optional) Convert Relion Euler Angles to Rotation Matrices
eulers = np.stack(df['rot'], df['tilt'], df['psi'])
rot = R.from_euler('ZYZ', eulers, degrees=True)
mats = rot.inv().as_matrix() # (N, 3, 3)
orientations = np.zeros((n,4,4))
orientations[:,:3,:3] = mats
# if no orientations are available, instead set matrix to identity
# orientations[:,:3,:3] = np.identity(3)
orientations[:,3,3] = 1
# Create a Pick Entry
run = root.get_run('Position_10_1')
pick = run.get_picks(
object_name = 'ribosome',
user_id='method', session_id='1',
exist_ok = True
)
picks.from_numpy(points, orientations)
Once written to Copick, you can generate a RELION5-compatible STAR file with:
py2rely prepare particles --config copick_config.json ...
After this step, the coordinates can be used in the full sub-tomogram averaging pipeline.
Merge particles from different picking methods, sessions, or manual annotations.
py2rely prepare combine-particles \
--input input/session1_particles.star \
--input input/session2_particles.star \
--input input/manual_picks.star \
--output input/all_particles.star
Common Scenarios
- Merge automated + manual picks (derived from different copick sessionIDs, userIDs)
- Multiple experimental sessions
Use ground-truth particle coordinates from the CryoET ML Challenge dataset described in:
Peck, A. et al., Nature Methods (2025)
https://www.nature.com/articles/s41592-025-02800-5
The coordinates are hosted on the CryoET Data Portal and can be accessed via a Copick configuration file.
Step 1 β Configure Copick
Ensure your copick_config.json references the following datasets:
10445/10446β Public/Private evaluation dataset
Generate a copick configuration file for a specified dataset
copick config dataportal -ds 10445 --overlay /path/to/overlay --output 10445_config.json
Step 2 β Import Ground-Truth Coordinates
Use the standard prepare particles command, filtering by author:
py2rely prepare particles \
--config 10445_config.json \
--session 10445 --name virus-like-particle \
--authors "Jonathan Schwartz"
The -a / --authors flag filters picks to those corresponding to
challenge ground truth annotations.
β This generates a Relion5-compatible STAR file that can be used for:
- Benchmarking particle picking methods
- Evaluating recall / precision
- Comparing automated picks to known ground truth
Why use the author filter?
The ML challenge datasets contain multiple annotation sources. Filtering by author ensures that you retrieve the curated ground-truth coordinates used for evaluation.
π Import Tilt Series
Import tilt series alignment from an AreTomo processing session.
What it does
py2rely reads these files and converts them into RELION's expected format, which includes:
- Tilt series alignment (
.aln) - CTF parameters (defocus, astigmatism)
- Dose weighting information
- Optics group assignments
- Generates Relion-compatible STAR files
py2rely prepare tilt-series \
--base-project /path/to/aretomo \
-s 24jan01 -r run001 \
-v 300 -sa 2.7 -ac 0.1 \
--pixel-size 1.54 --total-dose 60
How does py2rely find my data?
Given --base-project, --session, and optionally --run,
py2rely searches for tilt series using:
{base-project}/{session}/{run}/*_CTF.txt
If --run is omitted, all runs within the session directory are searched:
{base-project}/{session}/*_CTF.txt
Output structure:
input/tiltSeries/
βββ aligned_tilt_series.star # Global file (use this)
βββ tomo001.star
βββ tomo002.star
βββ ...
π
prepare tilt-series Parameters
| Parameter | Description | Default |
|---|---|---|
--base-project |
AreTomo project root directory | /hpc/projects/.../aretomo3 |
-s, --session |
Session identifier | 23dec21 |
-r, --run |
Run identifier | run001 |
-o, --output |
Output directory for STAR files | input |
-ps, --pixel-size |
Unbinned pixel size (Γ ) | 1.54 |
-td, --total-dose |
Total dose (eβ»/Γ Β²) | 60 |
-v, --voltage |
Acceleration voltage (kV) | 300 |
-sa, --spherical-aberration |
Cs value (mm) | 2.7 |
-ac, --amplitude-contrast |
Amplitude contrast | 0.07 |
-og, --optics-group |
Optics group number | 1 |
-ogn, --optics-group-name |
Optics group name | opticsGroup1 |
Import tilt series and alignments directly from datasets hosted on the
Chan Zuckerberg CryoET Data Portal.
Download Command
copick download project -ds 10445 -o path/to/files
Warning
Make sure you are using copick β₯ v1.18.
The command retrieves all files required for sub-tomogram averaging, including:
- Tilt series stacks (
*.mrc) - Alignment files (
*.aln) - CTF estimation outputs (
*_CTF.txt) - Metadata required by
py2rely(e.g.,ordered_list.csv)
The resulting directory mirrors the structure expected from an AreTomo processing session, meaning it can be used directly with:
py2rely prepare tilt-series ...
copick download project Parameters
| Parameter | Short | Description |
|---|---|---|
--dataset |
-ds |
CryoET Data Portal dataset ID |
--output |
-o |
Output directory for downloaded files |
What is this workflow used for?
This workflow is designed primarily for method developers and benchmarking, where you want to:
- Validate particle coordinates against public datasets
- Reproduce or compare published reconstructions
- Test new picking or averaging methods on standardized data
If you collected data across multiple days or runs, you can merge them into a single tilt series file.
Why combine sessions?
- Increase particle count for better statistics
- Pool data from multiple grid areas
- Combine different imaging conditions (optics groups)
# Import each session
py2rely prepare tilt-series --session 24jan01 --output input/
py2rely prepare tilt-series --session 24feb15 --output input/
# Combine them
py2rely prepare combine-tilt-series \
--input input/tiltSeries/aligned_tilt_series_24jan01.star \
--input input/tiltSeries/aligned_tilt_series_24feb15.star \
--output input/tiltSeries/aligned_tilt_series.star
Remove tomograms that don't contain any particles to speed up processing.
Why filter?
- Reduces computational overhead
- Avoids unnecessary pseudo-subtomogram extraction
- Keeps job logs cleaner
py2rely prepare filter-unused-tilts \
--particles input/24jan01_virus_like_particle.star \
--tomograms input/tiltSeries/aligned_tilt_series.star
What happens?
- Reads which tomograms appear in your particles file
- Removes tilt series entries for tomograms with zero particles
- Overwrites the original tilt series STAR file (makes backup first)
π Coordinate Systems
Relion 5.0 Format
py2rely uses Relion 5.0 centered coordinate convention:
| Type | Columns | Units | Origin |
|---|---|---|---|
| Centered | rlnCenteredCoordinate[XYZ]Angst |
Γ ngstroms | Center of tomogram |
| Pixel | rlnCoordinate[XYZ] |
Pixels | Top-left corner |
Conversion formula:
centered_angstrom = (pixel_coordinate - tomogram_size/2) Γ pixel_size
Complete Example
# 1. Import tilt series
py2rely prepare tilt-series \
--session 24jan01 \
--output input \
--pixel-size 1.54
# 2. Import particles
py2rely prepare particles \
--config copick.json \
--session 24jan01 \
--name ribosome \
--output input
# 3. Clean up unused tomograms
py2rely prepare filter-unused-tilts \
-p input/24jan01_virus_like_particle.star \
-t input/tiltSeries/aligned_tilt_series.star
Next Steps
- Ready to refine? β 3D Sub-tomogram Averaging
- Want to validate first? β 2D Slab Classification
- Need examples? β Quick Start Guide