Census data releases¶
Last edited: December 15th, 2023.
Contents:
What is a Census data release?¶
It is a Census build that is publicly hosted online. A Census build is a TileDB-SOMA collection with the Census data from CZ CELLxGENE Discover as specified in the Census schema.
Any given Census build is named with a unique tag, normally the date of build, e.g., "2023-05-15"
.
Long-term supported (LTS) Census releases¶
To enable data stability and scientific reproducibility, CZ CELLxGENE Discover plans to perform regular LTS Census data releases:
Published online every six months for public access, starting on May 15, 2023.
Available for public access for at least 5 years upon publication.
The most recent LTS Census data release is the default opened by the APIs and recognized as census_version = "stable"
. To open previous LTS Census data releases, you can directly specify the version via its build date census_version = "[YYYY]-[MM]-[DD]"
.
Python
import cellxgene_census
census = cellxgene_census.open_soma(census_version = "stable")
R
library("cellxgene.census")
census <- open_soma(census_version = "stable")
Weekly Census releases (latest)¶
CZ CELLxGENE Discover ingests a handful of new datasets every week. To quickly enable access to these new data via the Census, CZ CELLxGENE Discover plans to perform weekly Census data releases:
Available for public access for 1 month.
The most recent weekly release can be opened by the APIs by specifying census_version = "latest"
.
Python
import cellxgene_census
census = cellxgene_census.open_soma(census_version = "latest")
R
library("cellxgene.census")
census <- open_soma(census_version = "latest")
List of LTS Census data releases¶
LTS 2023-12-15¶
Open this data release by specifying census_version = "2023-12-15"
in future calls to open_soma()
.
Version information¶
Information |
Value |
---|---|
Census schema version |
|
Census build date |
2023-12-15 |
Dataset schema version |
|
Number of datasets |
651 |
Cell and donor counts¶
Type |
Homo sapiens |
Mus musculus |
---|---|---|
Total cells |
62,998,417 |
5,684,805 |
Unique cells |
36,227,903 |
4,128,230 |
Number of donors |
15,588 |
1,990 |
Cell metadata¶
Category |
Homo sapiens |
Mus musculus |
---|---|---|
Assay |
20 |
10 |
Cell type |
631 |
248 |
Development stage |
173 |
36 |
Disease |
72 |
5 |
Self-reported ethnicity |
30 |
NA |
Sex |
3 |
3 |
Suspension type |
2 |
2 |
Tissue |
230 |
74 |
Tissue general |
53 |
27 |
Cell embbedings¶
Find out more in the Census model page.
Available obsm
slots:
Method |
Homo sapiens |
Mus musculus |
---|---|---|
scVI |
|
|
Fine-tuned Geneformer |
|
NA |
LTS 2023-07-25¶
Open this data release by specifying census_version = "2023-07-25"
in future calls to open_soma()
.
Version information¶
Information |
Value |
---|---|
Census schema version |
|
Census build date |
2023-07-25 |
Dataset schema version |
|
Number of datasets |
593 |
Cell and donor counts¶
Type |
Homo sapiens |
Mus musculus |
---|---|---|
Total cells |
56,400,873 |
5,255,245 |
Unique cells |
33,364,242 |
4,083,531 |
Number of donors |
13,035 |
1,417 |
Cell metadata¶
Category |
Homo sapiens |
Mus musculus |
---|---|---|
Assay |
19 |
9 |
Cell type |
613 |
248 |
Development stage |
164 |
33 |
Disease |
64 |
5 |
Self-reported ethnicity |
26 |
NA |
Sex |
3 |
3 |
Suspension type |
2 |
2 |
Tissue |
220 |
66 |
Tissue general |
54 |
27 |
LTS 2023-05-15¶
Open this data release by specifying census_version = "2023-05-15"
in future calls to open_soma()
.
🔴 Errata 🔴¶
Duplicate observations with is_primary_data = True
¶
In order to prevent duplicate data in analyses, each observation (cell) should be marked is_primary data = True
exactly once in the Census. Since this LTS release, 243,569 observations have been identified that are represented at least twice with is_primary_data = True
.
This issue will be corrected in the following LTS data release, by identifying and marking only one cell out of the duplicates as is_primary_data = True
.
If you wish to use this data release, you can consider filtering out all of these 243,569 cells by using the soma_joinids
provided in this file duplicate_cells_census_LTS_2023-05-15.csv.zip. You can filter specific cells by using the value_filter
or obs_value_filter
of the querying API functions, for more information follow this tutorial.
Version information¶
Information |
Value |
---|---|
Census schema version |
|
Census build date |
2023-05-15 |
Dataset schema version |
|
Number of datasets |
562 |
Cell and donor counts¶
Type |
Homo sapiens |
Mus musculus |
---|---|---|
Total cells |
53,794,728 |
4,086,032 |
Unique cells |
33,758,887 |
2,914,318 |
Number of donors |
12,493 |
1,362 |
Cell metadata¶
Category |
Homo sapiens |
Mus musculus |
---|---|---|
Assay |
20 |
9 |
Cell type |
604 |
226 |
Development stage |
164 |
30 |
Disease |
68 |
5 |
Self-reported ethnicity |
26 |
NA |
Sex |
3 |
3 |
Suspension type |
2 |
2 |
Tissue |
227 |
51 |
Tissue general |
61 |
27 |