Cross Modality Schema

Contact: brianraymor@chanzuckerberg.com

Document Status: Approved

Version: 1.1.0

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED” “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14, RFC2119, and RFC8174 when, and only when, they appear in all capitals, as shown here.

Schema versioning

The cross modality schema version is based on Semantic Versioning.

Major version is incremented when incompatiable schema updates are introduced:

  • Renaming metadata fields

  • Deprecating metadata fields

  • Changing the type or format of a metadata field

Minor version is incremented when additive schema updates are introduced:

  • Adding metadata fields

  • Changing the validation requirements for a metadata field

Patch version is incremented for editorial updates.

All changes are documented in the schema Changelog.

Descriptive Metadata

The following descriptive metadata MUST be associated with all “registered - sharing” datasets to ensure that datasets can be found by searching for common experimental and biological characteristics of datasets. This list is intentionally limited to metadata that SHOULD be annotated at the time data are generated. These metadata MUST be programmatically validated to ensure compliance. Additional metadata MAY be annotated at the discretion of data stewards.

Required Ontologies

With the exception of Cellosaurus, ontology terms for metadata MUST use OBO-format identifiers, meaning a CURIE (prefixed identifier) of the form Ontology:Identifier. For example, EFO:0000001 is a term in the Experimental Factor Ontology (EFO). Cellosaurus requires a prefixed identifier of the form Ontology_Identifier such as CVCL_1P02.

If ontologies are missing required terms, then ontologists are responsive to New Term Requests [NTR] such as [NTR] Version specific Visium assays which was created for CELLxGENE Discover requirements.

The following ontologies are referenced in this schema:

Ontology

Prefix

C. elegans Development Ontology

WBls:

C. elegans Gross Anatomy Ontology

WBbt:

Cell Ontology

CL:

Cellosaurus

CVCL_

Drosophila Anatomy Ontology

FBbt:

Drosophila Development Ontology

FBdv:

Experimental Factor Ontology

EFO:

Gene Ontology

GO:

Human Developmental Stages

HsapDv:

Mondo Disease Ontology

MONDO:

Mouse Developmental Stages

MmusDv:

NCBI organismal classification

NCBITaxon:

Phenotype And Trait Ontology

PATO:

Uberon multi-species anatomy ontology

UBERON:

Zebrafish Anatomy Ontology

ZFA:
ZFS:

assay_ontology_term_id

Key assay_ontology_term_id
Description Defines the assay that was used to create the dataset
Annotator Submitter MUST annotate.
Value List[String]. The List element MUST be an Experimental Factor Ontology (EFO) term such as “EFO:0022605”.

assay

Key assay
Annotator System MUST annotate.
Value List[String]. The List element MUST be the human-readable name assigned to the corresponding element in assay_ontology_term_id.

development_stage_ontology_term_id

Key development_stage_ontology_term_id
Description Defines the development stage(s) of the patients or organisms from which assayed biosamples were derived
Annotator Submiter MUST annotate.
Value List[String]

If corresponding tissue_type is "cell line", the List element MUST be "na".

If unavailable, the List element MUST be "unknown".

For the corresponding
organism_ontology_term_id
Value
"NCBITaxon:6239"
for Caenorhabditis elegans
The List element MUST be WBls:0000669 for unfertilized egg Ce,
the most accurate descendant of WBls:0000803
for C. elegans life stage occurring during embryogenesis, or
the most accurate descendant of WBls:0000804
for C. elegans life stage occurring post embryogenesis
"NCBITaxon:7955"
for Danio rerio
The List element MUST be the most accurate descendant of ZFS:0100000
for zebrafish stage excluding ZFS:0000000 for Unknown
"NCBITaxon:7227"
for Drosophila melanogaster
The List element MUST be either the most accurate descendant of FBdv:00007014
for adult age in days or the most accurate descendant of FBdv:00005259
for developmental stage excluding FBdv:00007012 for life stage
"NCBITaxon:9606"
for Homo sapiens
The List element MUST be the most accurate descendant of HsapDv:0000001
for life cycle
"NCBITaxon:10090"
for Mus musculus or
one of its descendants
The List element MUST be the accurate descendant of MmusDv:0000001
for life cycle
For all other organisms The List element MUST be the most accurate descendant of UBERON:0000105
for life cycle stage, excluding UBERON:0000071 for death stage.

development_stage

Key development_stage
Annotator System MUST annotate.
Value List[String].

The List element MUST be "na" if the value of development_stage_ontology_term_id is "na".

The List element MUST be "unknown" if the value of development_stage_ontology_term_id is "unknown".

Otherwise, the List element MUST be the human-readable name assigned to the corresponding element in development_stage_ontology_term_id.

disease_ontology_term_id

Key disease_ontology_term_id
Description Defines the disease of the patients or organisms from which assayed biosamples were derived
Annotator Submitter MUST annotate.
Value List[String]. The List element MUST be one of:

disease

Key disease
Annotator System MUST annotate.
Value List[String]. The List element MUST be the human-readable name assigned to the corresponding element in disease_ontology_term_id.

organism_ontology_term_id

Key organism_ontology_term_id
Description Defines the organism from which assayed biosamples were derived
Annotator Submitter MUST annotate.
Value List[String]. The List element MUST be an NCBI organismal classification term such as "NCBITaxon:9606".

organism

Key organism
Annotator System MUST annotate.
Value List[String]. The List element MUST be the human-readable name assigned to the corresponding element in organism_ontology_term_id.

tissue_ontology_term_id

Key tissue_ontology_term_id
Description Defines the tissues from which assayed biosamples were derived
Annotator Submitter MUST annotate.
Value List[String]

If the corresponding tissue_type is "cell line", the List element MUST be a Cellosaurus term.

If the corresponding tissue_type is "organelle", the List element MUST be a descendant of GO:0005575
for cellular_component.

If the corresponding tissue_type is "tissue" or "organoid" then:

For the corresponding
organism_ontology_term_id
Value
"NCBITaxon:6239"
for Caenorhabditis elegans
The List element MUST be either an UBERON term or the most accurate descendant
of WBbt:0005766 for Anatomy excluding WBbt:0007849 for hermaphrodite,
WBbt:0007850 for male, WBbt:0008595 for female, WBbt:0004017 for Cell
and its descendants, and WBbt:00006803 for Nucleus and its descendants
"NCBITaxon:7955"
for Danio rerio
The List element MUST be either an UBERON term or the most accurate descendant
of ZFA:0100000 for zebrafish anatomical entity excluding ZFA:0001093 for
unspecified and ZFA:0009000 for cell and its descendants
"NCBITaxon:7227"
for Drosophila melanogaster
The List element MUST be either an UBERON term or the most accurate descendant
of FBbt:10000000 for anatomical entity excluding FBbt:00007002 for cell and its
descendants
For all other organisms The List element MUST be the most accurate descendant of UBERON:0001062
for anatomical entity

If the corresponding tissue_type is "cell culture", the following Cell Ontology (CL) terms MUST NOT be used:
For the corresponding
organism_ontology_term_id
Value
"NCBITaxon:6239"
for Caenorhabditis elegans
The List element MUST be either a CL term or the most accurate descendant of
WBbt:0004017 for Cell excluding WBbt:0006803 for Nucleus and its descendants
"NCBITaxon:7955"
for Danio rerio
The List element MUST be either a CL term or the most accurate descendant of
ZFA:0009000 for cell
"NCBITaxon:7227"
for Drosophila melanogaster
The List element MUST be either a CL term or the most accurate descendant of
FBbt:00007002 for cell
For all other organisms The List element MUST be a CL term.

tissue

Key tissue
Annotator System MUST annotate.
Value List[String]. The List element MUST be the human-readable name assigned to the corresponding element in tissue_ontology_term_id.

tissue_type

Key tissue_type
Annotator Submitter MUST annotate.
Value List[String]. The List element MUST be one of:
  • "cell culture"
  • "cell line"
  • "organelle"
  • "organoid"
  • "tissue"

Appendix A. Changelog

schema v1.1.0

  • Required Ontologies

    • Added requirements for prefixed ontology identifiers to address the Cellosaurus exception

    • Added Cellosaurus

    • Added Gene Ontology

  • development_stage_ontology_term_id

    • Require ”na” when the corresponding tissue_type is ”cell line”

  • development_stage

    • Require ”na” when the corresponding development_stage__ontology_term_id is ”na”

  • tissue_ontology_term_id

    • Require a Cellosaurus term identifier when the corresponding tissue_type is ”cell line”

    • Require a descendant of GO:0005575 for cellular_component when the corresponding tissue_type is ”organelle”

  • tissue_type

    • Added ”cell line”

    • Added ”organelle”

schema v1.0.0

  • Published minimal set of metadata requirements


Sourced from https://github.com/chanzuckerberg/data-guidance/blob/main/standards/cross-modality/1.1.0/schema.md