Work in Progress Imaging Metadata Schema

Contact: mcaton@chanzuckerberg.com and utz.ermel@czii.org

Document Status: Draft

Version: 1.0.0

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED” “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14, RFC2119, and RFC8174 when, and only when, they appear in all capitals, as shown here.

Schema versioning

The cross modality schema version is based on Semantic Versioning.

Major version is incremented when incompatiable schema updates are introduced:

  • Renaming metadata fields

  • Deprecating metadata fields

  • Changing the type or format of a metadata field

Minor version is incremented when additive schema updates are introduced:

  • Adding metadata fields

  • Changing the validation requirements for a metadata field

Patch version is incremented for editorial updates.

All changes are documented in the schema Changelog.

Background

Across the CZI network, we aim to standardize imaging data and metadata for ease of sharing, management, and downstream model training. Inline with this goal, we have outlined how the Dynamic Cell Atlas and the CryoET Portal implement the REQUIRED cross-modality schema. Given the variety of data formats and experimental metadata, we will continue to add to this set of requirments in the imaging working group. This document serves as a working draft and set of minimal standards.

Overview

This document is organized into two sections: cross-modality mapping for Dynamic Cell Atlas and cross-modality mapping for the CryoET portal.

Ontologies

These are the ontologies used.

With the exception of Cellosaurus, ontology terms for metadata MUST use OBO-format identifiers, meaning a CURIE (prefixed identifier) of the form Ontology:Identifier. For example, EFO:0000001 is a term in the Experimental Factor Ontology (EFO). Cellosaurus requires a prefixed identifier of the form Ontology_Identifier such as CVCL_1P02.

If ontologies are missing required terms, then ontologists are responsive to New Term Requests [NTR] such as [NTR] Version specific Visium assays which was created for CELLxGENE Discover requirements.

Ontology

OBO Prefix

C. elegans Development Ontology

WBls

C. elegans Gross Anatomy Ontology

WBbt

Cell Ontology

CL

Cellosaurus

CVCL_

Drosophila Anatomy Ontology

FBbt

Drosophila Development Ontology

FBdv

Experimental Factor Ontology

EFO

Gene Ontology

GO

Human Developmental Stages

HsapDv

Mondo Disease Ontology

MONDO

Mouse Developmental Stages

MmusDv

NCBI organismal classification

NCBITaxon

Phenotype And Trait Ontology

PATO

Uberon multi-species anatomy ontology

UBERON

Zebrafish Anatomy Ontology

ZFA
ZFS

Cell Line Ontology

CLO

Cross-modality mapping for Dynamic Cell Atlas

This refers specifically to how ontology terms from tables/fields defined in this sample-level metadata table below map to cross-modality ontology schema for the Dynamic Cell Atlas project.

DCA

CZI Crossmodal

Matching Ontology?

factor value[assay_ontology_term_id]

assay_ontology_term_id

Yes (will update microscopy terms)

factor value[assay]

assay

No (FBbi)

factor value[developmental_stage_ontology_term_id]

development_stage_ontology_term_id

Yes (HsapDV, MmusDv, ZFS, WBLS, FBDV)

factor value[developmental_stage]

development_stage

Yes(HsapDV, MmusDv, ZFS, WBLS, FBDV)

factor value[disease_ontology_term_id]

disease_ontology_term_id

Yes (MONDO, PATO)

factor value[disease]

disease

Yes (MONDO, PATO)

factor value[organism_ontology_term_id]

organism_ontology_term_id

Yes (NCBITaxon)

factor value[organism]

organism

Yes (NCBITaxon)

factor value[tissue_ontology_term_id]

tissue_ontology_term_id

Yes (UBERON)

factor value[tissue]

tissue

Yes (UBERON)

factor value[tissue_type]

tissue_type

Yes (NA)

Additional Dynamic Cell Atlas Schema

The Dynamic Cell Atlas is comprised of multiple fluorescence microscopy datasets transformed into standard zarrv3 format. Therefore, we also include the minimum additional variables for identifying the original images and communicating channel metadata. A shared ontology and schema for recording channel metadata is still under development. In this section, we will describe the current method.

Pathways:

  • For each converted zarrv3 image, the atlas tracks the pathways to the original, source data for data provenance.

Source_Raw_Path

  • Key: Source_Raw_Path

  • Description: This is the path to the original raw image, which is usually on an external S3 bucket, Google Drive, or website. Most of the original files are .tif or .zarr (version 2), which can be identified from the file path. This information is recorded for data provenance.

  • Value: List[String]. Each pathway SHOULD end in “.zip”, “.tif”, “.zarr”, etc.

Source_Seg_Path

  • Key: Source_Seg_Path

  • Description: This is the path to the original segmentation image, which is usually on an external S3 bucket, Google Drive, or website. Most of the original files are .tif or .zarr (version 2), which can be identified from the file path. This information is recorded for data provenance. During the zarr conversion these arrays are embedded within the zarrv3 store as labels or segmentations. If the image does not have related segmentations or masks, the column will be left as “Not Applicable”.

  • Value: List[String]. Each pathway should end in “.zip”, “.tif”, “.zarr”, etc.

Internal_S3_Path

  • Key: Internal_S3_Path

  • Description: This is the path to the zarrv3 converted image in the Dynamic Cell Atlas database. Each of these images lives in an internal S3 bucket that CZI owns for MDR registration. Note that if a file path is provided under Source_Seg_Path, there will also be a “labels” or “segmentations” folder embedded in the zarr store that has the corresponding converted array.

  • Value: List[String]. Each pathway MUST end in “.zarr” or “ome.zarr”.

Channel Metadata Fields:

  • For each image, the atlas metadata tracks the illumination type and target for n number of present channels. The channel # corresponds to the order of each in the zarr image (starting with 0).

Channel Illumination Type

  • Key: Raw_Image_Channel#_IlluminationType

  • Description: The illumnation type is the method used to capture the channel.

  • Value: List[String]. Each element can be one of the following: Transmitted, Fluorescence, Oblique, Nonlinear, and Other.

Channel Targets

  • Key: Raw_Image_Channel#_Target

  • Description: The target field is a descriptive channel parameter rather than an ontology-driven factor. It specifies the molecular or cellular feature imaged in that channel. The most common targets are: DNA, membrane, or a particular gene.

  • Value: List[String]. Each element SHOULD be one of the following: DNA, Membrane, or the approved gene symbol (HGNC) or UniProt accession for images with a fluorescence illumination type. For images with a brightfield illumination type, these channels will have “Transmitted Light” in this field.

Cell Line Fields:

DCA

CZI Crossmodal

Matching Ontology?

characteristics[Source_Raw_Path]

Not Applicable

No

characteristics[Source_Seg_Path]

Not Applicable

No

characteristics[Internal_S3_Path]

Not Applicable

No

characteristics[Raw_Image_Channel#_IlluminationType]

Not Applicable

No

characteristics[Raw_Image_Channel#_Target]

Not Applicable

No

factor[cell_ontology_id]

Not Applicable

Yes (CL)

factor[cell_line]

Not Applicable

No (Cellosaurus)

Cross-modality mapping for cryoET data portal

On-Disk Dataset Metadata

AssayDetails Metadata

XMS-1.1.0 Field

cryoET Field

Requirement

Description

Constraints and Comments

assay

assay

MUST

Defines the human-readable assay name that was used to create the dataset.

string

assay_ontology_term_id

assay_ontology_term_id

MUST

EFO ID corresponding to the assay(s) used.

string, MUST be EFO ID

Author Metadata

cryoET Field

Requirement

Description

Constraints and Comments

name

MUST

The full name of the author.

string

orcid

RECOMMENDED

The author’s ORCID.

String, MUST match ORCID format.

primary_author_status

SHOULD

Whether author should be considered first author.

bool

corresponding_author_status

SHOULD

Whether author should be considered corresponding author.

bool

kaggle_id

OPTIONAL

The author’s kaggle user id.

string

email

OPTIONAL

The author’s email address.

string

affiliation_name

OPTIONAL

The name of the institution the author is affiliated with.

string

affiliation_identifier

OPTIONAL

A Research Organization Registry (ROR) identifier.

string

affiliation_address

OPTIONAL

The address of the institution the author is affiliated with.

string

CellComponent Metadata

cryoET Field

Requirement

Description

Constraints and Comments

name

MUST

Name of the cellular component.

string

id

MUST

The GO identifier for the cellular component or ”not_reported”

string, see CellComponent.id below

CellComponent.id

If the dataset’s cryoET sample_type is "organelle", then the value MUST be a valid descendant of "GO:0005575" for cellular component.


If the dataset’s cryoET sample_type is "virus", then the value MUST be "GO:0044423" for virion component.


If the dataset’s cryoET sample_type is any other type, then the value MUST be "not_reported".


CellStrain Metadata

cryoET Field

Requirement

Description

Constraints and Comments

name

MUST

Strain information for the sample.

string

id

MUST

The cell line’s cellosaurus term, strain ID, or “not_reported”

string, see CellStrain.id below

CellStrain.id

If the dataset’s cryoET sample_type is "cell_line", then the value MUST be a valid Cellosaurus term.


If the dataset’s cryoET sample_type is any other type, then the value may be any other strain ID or "not_reported".


CellType Metadata

XMS-1.1.0 Field

cryoET Field

Requirement

Description

Constraints and Comments

tissue

name

MUST

Name of the cell type from which a biological sample used in a CryoET study is derived from, or the name of the cell line used.

string

tissue_ontology_term_id

id

MUST

The UBERON or Cell Ontology identifier for the tissue or "not_reported"

string, see CellType.id below

CellType.id

If the dataset’s cryoET sample_type is "primary_cell_culture”, the following Cell Ontology (CL) terms MUST NOT be used:

For the corresponding OrganismDetails.taxonomy_id

Value

"NCBITaxon:6239" for *Caenorhabditis elegans*

The value MUST be either a CL term or the most accurate descendant of WBbt:0004017 for Cell excluding WBbt:0006803 for Nucleus and its descendants

"NCBITaxon:7955" for *Danio rerio*

The value MUST be either a CL term or the most accurate descendant of ZFA:0009000 for cell

"NCBITaxon:7227" for *Drosophila melanogaster*

The value MUST be either a CL term or the most accurate descendant of FBbt:00007002 for cell

Otherwise, for all other organisms, the value MUST be a CL or UBERON term.


If the dataset’s cryoET sample_type is any other type, the value MAY follow the same rules as above, otherwise MUST be "not_reported".


CrossReferences Metadata

cryoET Field

Requirement

Description

Constraints and Comments

publications

RECOMMENDED

Comma-separated list of DOIs for publications associated with the dataset.

string, MUST be DOI format

related_database_entries

RECOMMENDED

Comma-separated list of related database entries for the dataset.

string, MUST be in appropriate format (EMPIAR-XXXXX, PDB-XXXX, EMDB-XXXXX)

related_database_links

OPTIONAL

Comma-separated list of related database links for the dataset.

string

dataset_citations

OPTIONAL

Comma-separated list of DOIs for publications citing the dataset.

string


DateStamp Metadata

cryoET Field

Requirement

Description

Constraints and Comments

deposition_date

MUST

The date a data item was received by the cryoET data portal.

date

release_date

MUST

The date a data item was received by the cryoET data portal.

date

last_modified_date

MUST

The date a piece of data was last modified on the cryoET data portal.

date


DevelopmentStageDetails Metadata

XMS-1.1.0 Field

cryoET Field

Requirement

Description

Constraints and Comments

development_stage

development_stage

MUST

Defines the development stage(s) of the patients or organisms from which assayed biosamples were derived.

string

development_stage_ontology_term_id

development_stage_ontology_term_id

MUST

Organism-specific ontology ID corresponding to the development stage(s).

string, See development_stage_ontology_term_id below.

DevelopmentStageDetails.development_stage_ontology_term_id
Type: string

If the dataset’s cryoET sample_type is "cell_line", the value MUST be "na".

If unavailable, the value MUST be "unknown".

For the corresponding OrganismDetails.taxonomy_id

Value

"NCBITaxon:6239" for Caenorhabditis elegans

The value MUST be WBls:0000669 for unfertilized egg Ce, the most accurate descendant of WBls:0000803 for C. elegans life stage occurring during embryogenesis, or the most accurate descendant of WBls:0000804 for C. elegans life stage occurring post embryogenesis

"NCBITaxon:7955" for Danio rerio

The value MUST be the most accurate descendant of ZFS:0100000 for zebrafish stage excluding ZFS:0000000 for Unknown

"NCBITaxon:7227" for Drosophila melanogaster

The value MUST be either the most accurate descendant of FBdv:00007014 for adult age in days or the most accurate descendant of FBdv:00005259 for developmental stage excluding FBdv:00007012 for life stage

"NCBITaxon:9606" for Homo sapiens

The value MUST be the most accurate descendant of HsapDv:0000001 for life cycle

"NCBITaxon:10090" for Mus musculus or one of its descendants

The value MUST be the accurate descendant of MmusDv:0000001 for life cycle

Otherwise, for all other organisms, the value MUST be the most accurate descendant of UBERON:0000105 for life cycle stage, excluding UBERON:0000071 for death stage.


DiseaseDetails Metadata

XMS-1.1.0 Field

cryoET Field

Requirement

Description

Constraints and Comments

disease

disease

MUST

Defines the disease(s) of the patients or organisms from which assayed biosamples were derived.

string

disease_ontology_term_id

disease_ontology_term_id

MUST

The ontology term ID(s) corresponding to the disease state(s).

string, The value MUST be one of: “PATO:0000461” for normal or healthy, the most accurate descendant of “MONDO:0000001” for disease, “MONDO:0021178” for injury or preferably its most accurate descendant


FundingDetails Metadata

cryoET Field

Requirement

Description

Constraints and Comments

funding_agency_name

RECOMMENDED

The name of the funding source.

string

grant_id

RECOMMENDED

Grant identifier provided by the funding agency

string


OrganismDetails Metadata

XMS-1.1.0 Field

cryoET Field

Requirement

Description

Constraints and Comments

organism

name

MUST

Name of the organism(s) from which a biological sample used in a CryoET study is derived from, e.g. homo sapiens.

string, not_reported if id is None

organism_ontology_term_id

taxonomy_id

MUST

The NCBI taxon ID(s) of the organism(s)

integer

See taxonomy_id below.

OrganismDetails.taxonomy_id
Type: integer

If the corresponding sample_type is "organism", "tissue", "cell", "organoid", "organelle" or "virus" the value MUST be an NCBI organismal classification term such as "9606"


If the corresponding sample_type is "in_vitro", "in_silico" or "other", the value MAY be an NCBI organismal classification term such as "9606", otherwise it MUST be None.


PicturePath

cryoET Field

Requirement

Description

Constraints and Comments

snapshot

RECOMMENDED

Path to the preview image relative to the entity directory root.

string, - API: MUST be URL format - Metadata: MUST be relative path from dataset root.

thumbnail

RECOMMENDED

Path to the thumbnail of preview image relative to the entity directory root.

string, - API: MUST be URL format - Metadata: MUST be relative path from dataset root.


SampleType Enum

XMS-1.1.0 tissue_type value

cryoET value

Description

tissue

organism

Tomographic data of sections through multicellular organisms

tissue

tissue

Tomographic data of tissue sections

cell line

cell_line

Tomographic data of immortalized cells or immortalized cell sections

cell culture

primary_cell_culture

Tomographic data of whole primary cells or primary cell sections

organoid

organoid

Tomographic data of organoid-derived samples

organelle

organelle

Tomographic data of purified organelles

organelle

virus

Tomographic data of purified viruses or VLPs

not registered/mapped in 1.1.0

in_vitro

Tomographic data of in vitro reconstituted systems or mixtures of proteins

not registered/mapped in 1.1.0

in_silico

Simulated tomographic data

not registered/mapped in 1.1.0

other

Other type of sample


TissueDetails Metadata

XMS-1.1.0 Field

cryoET Field

Requirement

Description

Constraints and Comments

tissue

name

MUST

Name of the tissue from which a biological sample used in a CryoET study is derived from.

string

tissue_ontology_term_id

id

MUST

The UBERON identifier for the tissue or "not_reported"

string See TissueDetails.id below.

TissueDetails.id

Type: string

If the dataset’s cryoET sample_type is "organism", "tissue" or "organoid" then:

For the corresponding OrganismDetails.taxonomy_id

Value

"NCBITaxon:6239" for *Caenorhabditis elegans*

The value MUST be either an UBERON term or the most accurate descendant of WBbt:0005766 for Anatomy excluding WBbt:0007849 for hermaphrodite, WBbt:0007850 for male, WBbt:0008595 for female, WBbt:0004017 for Cell and its descendants, and WBbt:00006803 for Nucleus and its descendants

"NCBITaxon:7955" for *Danio rerio*

The value MUST be either an UBERON term or the most accurate descendant of ZFA:0100000 for zebrafish anatomical entity excluding ZFA:0001093 for unspecified and ZFA:0009000 for cell and its descendants

"NCBITaxon:7227" for *Drosophila melanogaster*

The value MUST be either an UBERON term or the most accurate descendant of FBbt:10000000 for anatomical entity excluding FBbt:00007002 for cell and its descendants

For all other organisms

The value MUST be the most accurate descendant of UBERON:0001062 for anatomical entity.


If the dataset’s cryoET sample_type is "primary_cell_culture”, "cell_line" or "organelle" the value MAY follow the definition for "tissue", otherwise it MUST be "not_reported".


If the dataset’s cryoET sample_type is "virus", "in_vitro", "in_silico" or "other" then the value MUST be "not_reported".


Dataset Metadata

cryoET Field

Requirement

Description

Constraints and Comments

deposition_id

MUST

An identifier for a CryoET deposition, assigned by the Data Portal. Used to identify the deposition the entity is a part of.

integer

last_updated_at

MUST

POSIX timestamp of the last time this metadata file was updated.

float

key_photos

MUST

A set of paths to representative images of a piece of data for metadata files.

PicturePath metadata

dataset_identifier

MUST

An identifier for a CryoET dataset, assigned by the Data Portal. Used to identify the dataset as the directory name in data tree.

integer

dataset_title

MUST

Title of a CryoET dataset.

string

dataset_description

MUST

A short description of a CryoET dataset, similar to an abstract for a journal article or dataset.

string

dates

MUST

A set of dates at which a data item was deposited, published and last modified.

DateStamp

authors

MUST

Author of a scientific data entity.

list of Author metadata , min length=1

funding

RECOMMENDED

A funding source for a scientific data entity (base for JSON and DB representation).

list of FundingDetails metadata

cross_references

OPTIONAL

A set of cross-references to other databases and publications.

CrossReferences metadata

sample_type

MUST

Type of sample imaged in a CryoET study.

SampleTypeEnum value

sample_preparation

RECOMMENDED

Describes how the sample was prepared.

string

grid_preparation

RECOMMENDED

Describes Cryo-ET grid preparation.

string

other_setup

RECOMMENDED

Describes other setup not covered by sample preparation or grid preparation that may make this dataset unique in the same publication.

string

organism

MUST

The species from which the sample was derived.

OrganismDetails metadata

tissue

MUST

The type of tissue from which the sample was derived.

TissueDetails metadata

cell_type

MUST

The cell type from which the sample was derived.

CellType metadata

cell_strain

MUST

The strain or cell line from which the sample was derived.

CellStrain metadata

cell_component

MUST

The cellular component from which the sample was derived.

CellComponent metadata

assay

MUST

Defines the assay(s) that was used to create the dataset.

AssayDetails metadata

development_stage

MUST

Defines the development stage(s) of the patients or organisms from which assayed biosamples were derived.

DevelopmentStageDetails metadata

disease

MUST

Defines the disease(s) of the patients or organisms from which assayed biosamples were derived.

DiseaseDetails metadata

Database and API Mapping

Mapping of the Dataset metadata to the database, GraphQL API and python API client is as shown below.

DB Column

DB Type

PK/FK

Nullable?

GraphQL API Field

GraphQL API Type

Python Client Field

Python Client Type

Mapped AWS S3 Metadata Field

id

Integer

PK

No

id

Int!

Dataset.id

int

Dataset.dataset_identifier

deposition_id

Integer

FK

No

depositionId

ID

Dataset.deposition_id

int

Dataset.deposition_id

title

String

No

title

String!

Dataset.title

str

Dataset.dataset_title

description

String

No

description

String!

Dataset.description

str

Dataset.dataset_description

organism_name

String

No

organismName

String!

Dataset.organism_name

str

Dataset.organism.name

organism_taxid

Integer

No

organismTaxid

Int!

Dataset.organism_taxid

int

Dataset.organism.taxonomy_id

tissue_name

String

No

tissueName

String!

Dataset.tissue_name

str

Dataset.tissue.name

tissue_id

String

No

tissueId

String!

Dataset.tissue_id

str

Dataset.tissue.id

cell_name

String

No

cellName

String!

Dataset.cell_name

str

Dataset.cell_type.name

cell_type_id

String

No

cellTypeId

String!

Dataset.cell_type_id

str

Dataset.cell_type.id

cell_strain_name

String

No

cellStrainName

String!

Dataset.cell_strain_name

str

Dataset.cell_strain.name

cell_strain_id

String

No

cellStrainId

String!

Dataset.cell_strain_id

str

Dataset.cell_strain.id

sample_type

Enum

No

sampleType

sample_type_enum!

Dataset.sample_type

str

Dataset.sample_type

sample_preparation

String

Yes

samplePreparation

String

Dataset.sample_preparation

str

Dataset.sample_preparation

grid_preparation

String

Yes

gridPreparation

String

Dataset.grid_preparation

str

Dataset.grid_preparation

other_setup

String

Yes

otherSetup

String

Dataset.other_setup

str

Dataset.other_setup

key_photo_url

String

Yes

keyPhotoUrl

String

Dataset.key_photo_url

str

Dataset.key_photos.snapshot

key_photo_thumbnail_url

String

Yes

keyPhotoThumbnailUrl

String

Dataset.key_photo_thumbnail_url

str

Dataset.key_photos.thumbnail

cell_component_name

String

No

cellComponentName

String!

Dataset.cell_component_name

str

Dataset.cell_component.name

cell_component_id

String

No

cellComponentId

String!

Dataset.cell_component_id

str

Dataset.cell_component.id

deposition_date

DateTime

No

depositionDate

DateTime!

Dataset.deposition_date

date

Dataset.dates.deposition_date

release_date

DateTime

No

releaseDate

DateTime!

Dataset.release_date

date

Dataset.dates.release_date

last_modified_date

DateTime

No

lastModifiedDate

DateTime!

Dataset.last_modified_date

date

Dataset.dates.last_modified_date

dataset_publications

String

Yes

datasetPublications

String

Dataset.dataset_publications

str

Dataset.cross_references.publications

related_database_entries

String

Yes

relatedDatabaseEntries

String

Dataset.related_database_entries

str

Dataset.cross_references.related_database_entries

s3_prefix

String

No

s3Prefix

String!

Dataset.s3_prefix

str

Dataset.s3_prefix

https_prefix

String

No

httpsPrefix

String!

Dataset.https_prefix

str

Dataset.https_prefix

file_size

Float

Yes

fileSize

Float

Dataset.file_size

float

computed during DB import

assay_name

String

No

assayName

String!

Dataset.assay_name

str

Dataset.assay.assay

assay_ontology_term_id

String

No

assayOntologyTermId

String!

Dataset.assay_ontology_term_id

str

Dataset.assay.assay_ontology_term_id

development_stage

String

No

developmentStage

String!

Dataset.development_stage

str

Dataset.development_stage.development_stage

development_stage_ontology_term_id

String

No

developmentStageOntologyTermId

String!

Dataset.development_stage_ontology_term_id

str

Dataset.development_stage.development_stage_ontology_term_id

disease

String

No

disease

String!

Dataset.disease

str

Dataset.disease.disease

disease_ontology_term_id

String

No

diseaseOntologyTermId

String!

Dataset.disease_ontology_term_id

str

Dataset.disease.disease_ontology_term_id

Mapping to XMS 1.1.0

XMS-1.1.0 metadata Mapping

Mapping will be specified in terms of Python API client fields (as that is what will be used in automatic MDR registration).

XMS-1.1.0

Python Client Field

Notes

assay_name

Dataset.assay_name

convert to list of string

assay_ontology_term_id

Dataset.assay_ontology_term_id

convert to list of string

development_stage

Dataset.development_stage

convert to list of string

development_stage_ontology_term_id

Dataset.development_stage_ontology_term_id

convert to list of string

disease

Dataset.disease

convert to list of string

disease_ontology_term_id

Dataset.disease_ontology_term_id

convert to list of string

organism

Dataset.organism_name

convert to list of string

organism_ontology_term_id

Dataset.organism_taxid

Convert to list of string, prepend “NCBITaxon:”. If None, exclude dataset.

tissue

depends on Dataset.sample_type

See tissue mapping rules below

tissue_ontology_term_id

depends on Dataset.sample_type

See tissue_ontology_term_id mapping rules below

tissue_type

depends on Dataset.sample_type

See tissue_type mapping rules below

XMS-1.1.0 tissue_type mapping

Sample types are mapped as follows:

XMS-1.1.0 tissue_type value

cryoET value

Description

tissue

organism

Tomographic data of sections through multicellular organisms

tissue

tissue

Tomographic data of tissue sections

cell line

cell_line

Tomographic data of immortalized cells or immortalized cell sections

cell culture

primary_cell_culture

Tomographic data of whole primary cells or primary cell sections

organoid

organoid

Tomographic data of organoid-derived samples

organelle

organelle

Tomographic data of purified organelles

organelle

virus

Tomographic data of purified viruses or VLPs

not registered/mapped in 1.1.0

in_vitro

Tomographic data of in vitro reconstituted systems or mixtures of proteins

not registered/mapped in 1.1.0

in_silico

Simulated tomographic data

not registered/mapped in 1.1.0

other

Other type of sample

XMS-1.1.0 tissue_ontology_term_id mapping

If cryoET sample_type is ”organism” or “tissue”, XMS-1.1.0 tissue_type is ”tissue”. XMS-1.1.0 tissue and tissue_ontology_term_id are mapped to the following Python client fields:

XMS-1.1.0

Python Client Field

Notes

tissue

Dataset.tissue_name

convert to list of string

tissue_ontology_term_id

Dataset.tissue_id

convert to list of string

If cryoET sample_type is ”cell_line”, XMS-1.1.0 tissue_type is ”cell line”. XMS-1.1.0 tissue and tissue_ontology_term_id are mapped to the following Python client fields:

XMS-1.1.0

Python Client Field

Notes

tissue

Dataset.cell_strain_name

convert to list of string

tissue_ontology_term_id

Dataset.cell_strain_id

convert to list of string

If cryoET sample_type is ”primary_cell_culture”, XMS-1.1.0 tissue_type is ”cell culture”. XMS-1.1.0 tissue and tissue_ontology_term_id are mapped to the following Python client fields:

XMS-1.1.0

Python Client Field

Notes

tissue

Dataset.cell_name

convert to list of string

tissue_ontology_term_id

Dataset.cell_type_id

convert to list of string

If cryoET sample_type is ”organoid”, XMS-1.1.0 tissue_type is ”organoid”. XMS-1.1.0 tissue and tissue_ontology_term_id are mapped to the following Python client fields:

XMS-1.1.0

Python Client Field

Notes

tissue

Dataset.tissue_name

convert to list of string

tissue_ontology_term_id

Dataset.tissue_id

convert to list of string

If cryoET sample_type is ”organelle” or "virus", XMS-1.1.0 tissue_type is ”organelle”. XMS-1.1.0 tissue and tissue_ontology_term_id are mapped to the following Python client fields:

XMS-1.1.0

Python Client Field

Notes

tissue

Dataset.cell_component_name

convert to list of string

tissue_ontology_term_id

Dataset.cell_component_id

convert to list of string

Changelog

v1.0.0

  • Published minimal set of metadata requirements


Sourced from https://github.com/chanzuckerberg/data-guidance/blob/main/standards/imaging/1.0.0/schema.md