Cross Modality Schema
Contact: brianraymor@chanzuckerberg.com
Document Status: Approved
Version: 1.1.0
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED” “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14, RFC2119, and RFC8174 when, and only when, they appear in all capitals, as shown here.
Schema versioning
The cross modality schema version is based on Semantic Versioning.
Major version is incremented when incompatiable schema updates are introduced:
Renaming metadata fields
Deprecating metadata fields
Changing the type or format of a metadata field
Minor version is incremented when additive schema updates are introduced:
Adding metadata fields
Changing the validation requirements for a metadata field
Patch version is incremented for editorial updates.
All changes are documented in the schema Changelog.
Descriptive Metadata
The following descriptive metadata MUST be associated with all “registered - sharing” datasets to ensure that datasets can be found by searching for common experimental and biological characteristics of datasets. This list is intentionally limited to metadata that SHOULD be annotated at the time data are generated. These metadata MUST be programmatically validated to ensure compliance. Additional metadata MAY be annotated at the discretion of data stewards.
Required Ontologies
With the exception of Cellosaurus, ontology terms for metadata MUST use OBO-format identifiers, meaning a CURIE (prefixed identifier) of the form Ontology:Identifier. For example, EFO:0000001 is a term in the Experimental Factor Ontology (EFO). Cellosaurus requires a prefixed identifier of the form Ontology_Identifier such as CVCL_1P02.
If ontologies are missing required terms, then ontologists are responsive to New Term Requests [NTR] such as [NTR] Version specific Visium assays which was created for CELLxGENE Discover requirements.
The following ontologies are referenced in this schema:
Ontology |
Prefix |
|---|---|
WBls: |
|
WBbt: |
|
CL: |
|
CVCL_ |
|
FBbt: |
|
FBdv: |
|
EFO: |
|
GO: |
|
HsapDv: |
|
MONDO: |
|
MmusDv: |
|
NCBITaxon: |
|
PATO: |
|
UBERON: |
|
ZFA: |
|
assay_ontology_term_id
| Key | assay_ontology_term_id |
|---|---|
| Description | Defines the assay that was used to create the dataset |
| Annotator | Submitter MUST annotate. |
| Value | List[String]. The List element MUST be an Experimental Factor Ontology (EFO) term such as “EFO:0022605”.
|
assay
| Key | assay |
|---|---|
| Annotator | System MUST annotate. |
| Value | List[String]. The List element MUST be the human-readable name assigned to the corresponding element in assay_ontology_term_id.
|
development_stage_ontology_term_id
| Key | development_stage_ontology_term_id | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Description | Defines the development stage(s) of the patients or organisms from which assayed biosamples were derived | ||||||||||||||
| Annotator | Submiter MUST annotate. | ||||||||||||||
| Value |
List[String]If corresponding tissue_type is "cell line", the List element MUST be "na".If unavailable, the List element MUST be "unknown".
|
development_stage
| Key | development_stage |
|---|---|
| Annotator | System MUST annotate. |
| Value | List[String]. The List element MUST be "na" if the value of development_stage_ontology_term_id is "na".The List element MUST be "unknown" if the value of development_stage_ontology_term_id is "unknown".Otherwise, the List element MUST be the human-readable name assigned to the corresponding element in development_stage_ontology_term_id.
|
disease_ontology_term_id
| Key | disease_ontology_term_id |
|---|---|
| Description | Defines the disease of the patients or organisms from which assayed biosamples were derived |
| Annotator | Submitter MUST annotate. |
| Value | List[String]. The List element MUST be one of:
|
disease
| Key | disease |
|---|---|
| Annotator | System MUST annotate. |
| Value | List[String]. The List element MUST be the human-readable name assigned to the corresponding element in disease_ontology_term_id.
|
organism_ontology_term_id
| Key | organism_ontology_term_id |
|---|---|
| Description | Defines the organism from which assayed biosamples were derived |
| Annotator | Submitter MUST annotate. |
| Value | List[String]. The List element MUST be an NCBI organismal classification term such as "NCBITaxon:9606".
|
organism
| Key | organism |
|---|---|
| Annotator | System MUST annotate. |
| Value | List[String]. The List element MUST be the human-readable name assigned to the corresponding element in organism_ontology_term_id.
|
tissue_ontology_term_id
| Key | tissue_ontology_term_id | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Description | Defines the tissues from which assayed biosamples were derived | ||||||||||||||||||||
| Annotator | Submitter MUST annotate. | ||||||||||||||||||||
| Value |
List[String]If the corresponding tissue_type is "cell line", the List element MUST be a Cellosaurus term.If the corresponding tissue_type is "organelle", the List element MUST be a descendant of GO:0005575for cellular_component. If the corresponding tissue_type is "tissue" or "organoid" then:
If the corresponding tissue_type is "cell culture", the following Cell Ontology (CL) terms MUST NOT be used:
|
tissue
| Key | tissue |
|---|---|
| Annotator | System MUST annotate. |
| Value | List[String]. The List element MUST be the human-readable name assigned to the corresponding element in tissue_ontology_term_id.
|
tissue_type
| Key | tissue_type |
|---|---|
| Annotator | Submitter MUST annotate. |
| Value | List[String]. The List element MUST be one of:
|
Appendix A. Changelog
schema v1.1.0
Required Ontologies
Added requirements for prefixed ontology identifiers to address the Cellosaurus exception
Added Cellosaurus
Added Gene Ontology
development_stage_ontology_term_id
Require
”na”when the correspondingtissue_typeis”cell line”
development_stage
Require
”na”when the correspondingdevelopment_stage__ontology_term_idis”na”
tissue_ontology_term_id
Require a Cellosaurus term identifier when the corresponding
tissue_typeis”cell line”Require a descendant of
GO:0005575for cellular_component when the correspondingtissue_typeis”organelle”
tissue_type
Added
”cell line”Added
”organelle”
schema v1.0.0
Published minimal set of metadata requirements
Sourced from https://github.com/chanzuckerberg/data-guidance/blob/main/standards/cross-modality/1.1.0/schema.md