Cross Modality Schema
Contact: brianraymor@chanzuckerberg.com
Document Status: Approved
Version: 1.1.0
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED” “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14, RFC2119, and RFC8174 when, and only when, they appear in all capitals, as shown here.
Schema versioning
The cross modality schema version is based on Semantic Versioning.
Major version is incremented when incompatiable schema updates are introduced:
Renaming metadata fields
Deprecating metadata fields
Changing the type or format of a metadata field
Minor version is incremented when additive schema updates are introduced:
Adding metadata fields
Changing the validation requirements for a metadata field
Patch version is incremented for editorial updates.
All changes are documented in the schema Changelog.
Descriptive Metadata
The following descriptive metadata MUST be associated with all “registered - sharing” datasets to ensure that datasets can be found by searching for common experimental and biological characteristics of datasets. This list is intentionally limited to metadata that SHOULD be annotated at the time data are generated. These metadata MUST be programmatically validated to ensure compliance. Additional metadata MAY be annotated at the discretion of data stewards.
Required Ontologies
With the exception of Cellosaurus, ontology terms for metadata MUST use OBO-format identifiers, meaning a CURIE (prefixed identifier) of the form Ontology:Identifier. For example, EFO:0000001 is a term in the Experimental Factor Ontology (EFO). Cellosaurus requires a prefixed identifier of the form Ontology_Identifier such as CVCL_1P02.
If ontologies are missing required terms, then ontologists are responsive to New Term Requests [NTR] such as [NTR] Version specific Visium assays which was created for CELLxGENE Discover requirements.
The following ontologies are referenced in this schema:
Ontology |
Prefix |
---|---|
WBls: |
|
WBbt: |
|
CL: |
|
CVCL_ |
|
FBbt: |
|
FBdv: |
|
EFO: |
|
GO: |
|
HsapDv: |
|
MONDO: |
|
MmusDv: |
|
NCBITaxon: |
|
PATO: |
|
UBERON: |
|
ZFA: |
|
assay_ontology_term_id
Key | assay_ontology_term_id |
---|---|
Description | Defines the assay that was used to create the dataset |
Annotator | Submitter MUST annotate. |
Value | List[String] . The List element MUST be an Experimental Factor Ontology (EFO) term such as “EFO:0022605” .
|
assay
Key | assay |
---|---|
Annotator | System MUST annotate. |
Value | List[String] . The List element MUST be the human-readable name assigned to the corresponding element in assay_ontology_term_id .
|
development_stage_ontology_term_id
Key | development_stage_ontology_term_id | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Description | Defines the development stage(s) of the patients or organisms from which assayed biosamples were derived | ||||||||||||||
Annotator | Submiter MUST annotate. | ||||||||||||||
Value |
List[String] If corresponding tissue_type is "cell line" , the List element MUST be "na" .If unavailable, the List element MUST be "unknown" .
|
development_stage
Key | development_stage |
---|---|
Annotator | System MUST annotate. |
Value | List[String] . The List element MUST be "na" if the value of development_stage_ontology_term_id is "na" .The List element MUST be "unknown" if the value of development_stage_ontology_term_id is "unknown" .Otherwise, the List element MUST be the human-readable name assigned to the corresponding element in development_stage_ontology_term_id .
|
disease_ontology_term_id
Key | disease_ontology_term_id |
---|---|
Description | Defines the disease of the patients or organisms from which assayed biosamples were derived |
Annotator | Submitter MUST annotate. |
Value | List[String] . The List element MUST be one of:
|
disease
Key | disease |
---|---|
Annotator | System MUST annotate. |
Value | List[String] . The List element MUST be the human-readable name assigned to the corresponding element in disease_ontology_term_id .
|
organism_ontology_term_id
Key | organism_ontology_term_id |
---|---|
Description | Defines the organism from which assayed biosamples were derived |
Annotator | Submitter MUST annotate. |
Value | List[String] . The List element MUST be an NCBI organismal classification term such as "NCBITaxon:9606" .
|
organism
Key | organism |
---|---|
Annotator | System MUST annotate. |
Value | List[String] . The List element MUST be the human-readable name assigned to the corresponding element in organism_ontology_term_id .
|
tissue_ontology_term_id
Key | tissue_ontology_term_id | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Description | Defines the tissues from which assayed biosamples were derived | ||||||||||||||||||||
Annotator | Submitter MUST annotate. | ||||||||||||||||||||
Value |
List[String] If the corresponding tissue_type is "cell line" , the List element MUST be a Cellosaurus term.If the corresponding tissue_type is "organelle" , the List element MUST be a descendant of GO:0005575 for cellular_component. If the corresponding tissue_type is "tissue" or "organoid" then:
If the corresponding tissue_type is "cell culture" , the following Cell Ontology (CL) terms MUST NOT be used:
|
tissue
Key | tissue |
---|---|
Annotator | System MUST annotate. |
Value | List[String] . The List element MUST be the human-readable name assigned to the corresponding element in tissue_ontology_term_id .
|
tissue_type
Key | tissue_type |
---|---|
Annotator | Submitter MUST annotate. |
Value | List[String] . The List element MUST be one of:
|
Appendix A. Changelog
schema v1.1.0
Required Ontologies
Added requirements for prefixed ontology identifiers to address the Cellosaurus exception
Added Cellosaurus
Added Gene Ontology
development_stage_ontology_term_id
Require
”na”
when the correspondingtissue_type
is”cell line”
development_stage
Require
”na”
when the correspondingdevelopment_stage__ontology_term_id
is”na”
tissue_ontology_term_id
Require a Cellosaurus term identifier when the corresponding
tissue_type
is”cell line”
Require a descendant of
GO:0005575
for cellular_component when the correspondingtissue_type
is”organelle”
tissue_type
Added
”cell line”
Added
”organelle”
schema v1.0.0
Published minimal set of metadata requirements
Sourced from https://github.com/chanzuckerberg/data-guidance/blob/main/standards/cross-modality/1.1.0/schema.md