Census data releases

Last edited: Nov 8th, 2025.

Contents:

  1. What is a Census data release?

  2. List of LTS Census data releases

  3. Compatibility with package versions

What is a Census data release?

It is a Census build that is publicly hosted online. A Census build is a TileDB-SOMA collection with the Census data from CZ CELLxGENE Discover as specified in the Census schema.

Any given Census build is named with a unique tag, normally the date of build, e.g., "2025-01-30".

Long-term supported (LTS) Census releases

To enable data stability and scientific reproducibility, CZ CELLxGENE Discover plans to keep certain Census data releases available for public access for at least 5 years upon publication.

The most recent LTS Census data release is the default opened by the APIs and recognized as census_version = "stable". To open previous LTS Census data releases, you can directly specify the version via its build date census_version = "[YYYY]-[MM]-[DD]".

Python

import cellxgene_census
census = cellxgene_census.open_soma(census_version = "stable")

R

library("cellxgene.census")
census <- open_soma(census_version = "stable")

Weekly Census releases (latest)

CZ CELLxGENE Discover ingests a handful of new datasets every week. To quickly enable access to these new data via the Census, CZ CELLxGENE Discover plans to perform weekly Census data releases, available for public access for 1 month.

The most recent weekly release can be opened by the APIs by specifying census_version = "latest".

Python

import cellxgene_census
census = cellxgene_census.open_soma(census_version = "latest")

R

library("cellxgene.census")
census <- open_soma(census_version = "latest")

List of LTS Census data releases

LTS 2025-11-08

Open this data release by specifying census_version = "2025-11-08" in future calls to open_soma().

Version information

Information

Value

Census schema version

2.4.0

Census build date

2025-11-08

Dataset schema version

7.0.0

Number of datasets

1845

Schema changes

Census schema 2.4.0 has a few important changes that may need adjustments in analysis code:

  • The obs disease and disease_ontology_term_id fields may now contain multiple values delimited by ' || ', so exact string equality queries on these fields may yield incomplete results.

  • The var feature_name field is no longer necessarily unique. Previously, colliding gene symbols were disambiguated by appending their feature_id (Ensembl gene ID). feature_name is now populated with the exact gene symbols, even if used multiple times, while feature_id remains unique.

These reflect changes in the newer CELLxGENE Dataset schema version.

Cell counts

Species

Total cells

Unique cells

Homo sapiens

162,025,130

99,633,637

Mus musculus

46,299,127

21,029,771

Macaca mulatta

7,010,229

2,929,014

Callithrix jacchus

2,275,451

1,712,738

Pan troglodytes

158,099

158,099

Cell metadata

Category

Homo sapiens

Mus musculus

Callithrix jacchus

Macaca mulatta

Pan troglodytes

Assay

37

16

1

2

1

Cell type

898

473

40

54

25

Development stage

194

66

3

4

1

Disease

192

16

1

1

1

Self-reported ethnicity

33

1

1

1

1

Sex

3

3

2

3

2

Suspension type

2

2

1

2

1

Tissue

417

101

33

29

1

Tissue general

70

36

1

2

1

Embeddings

Find out more in the Census models page.

Available embeddings can be accessed via cellxgene_census.experimental.get_embedding(), or by specifying the obs_embeddings/var_embeddings field in cellxgene_census.get_anndata().

Cells

Method

Homo sapiens

Mus musculus

scVI

scvi

scvi

TranscriptFormer tf-sapiens

tf-sapiens

N/A

TranscriptFormer tf-exemplar

tf-exemplar-human

tf-exemplar-mouse

LTS 2025-01-30

Open this data release by specifying census_version = "2025-01-30" in future calls to open_soma().

Version information

Information

Value

Census schema version

2.1.0

Census build date

2025-01-30

Dataset schema version

5.2.0

Number of datasets

1573

Cell and donor counts

Type

Homo sapiens

Mus musculus

Total cells

109,085,698

45,351,496

Unique cells

65,601,657

20,208,302

Cell metadata

Category

Homo sapiens

Mus musculus

Assay

31

17

Cell type

827

453

Development stage

179

58

Disease

140

12

Self-reported ethnicity

36

1

Sex

3

3

Suspension type

1

1

Tissue

379

99

Tissue general

68

36

Embeddings

Find out more in the Census model page.

Available embeddings can be accessed via cellxgene_census.experimental.get_embedding(), or by specifying the obs_embeddings/var_embeddings field in cellxgene_census.get_anndata().

Cells

Method

Homo sapiens

Mus musculus

scVI

scvi

scvi

LTS 2024-07-01

Open this data release by specifying census_version = "2024-07-01" in future calls to open_soma().

Version information

Information

Value

Census schema version

2.0.1

Census build date

2024-05-20

Dataset schema version

5.0.0

Number of datasets

812

Cell and donor counts

Type

Homo sapiens

Mus musculus

Total cells

74,322,510

41,233,630

Unique cells

44,265,932

16,332,034

Cell metadata

Category

Homo sapiens

Mus musculus

Assay

24

11

Cell type

698

364

Development stage

176

48

Disease

109

7

Self-reported ethnicity

31

NA

Sex

3

3

Suspension type

2

2

Tissue

267

84

Tissue general

55

29

Embeddings

Find out more in the Census model page.

Available embeddings can be accessed via cellxgene_census.experimental.get_embedding(), or by specifying the obs_embeddings/var_embeddings field in cellxgene_census.get_anndata().

Cells

Method

Homo sapiens

Mus musculus

scVI

scvi

scvi

Geneformer

geneformer

NA

LTS 2023-12-15

Open this data release by specifying census_version = "2023-12-15" in future calls to open_soma().

Version information

Information

Value

Census schema version

1.2.0

Census build date

2023-12-15

Dataset schema version

3.1.0

Number of datasets

651

Cell and donor counts

Type

Homo sapiens

Mus musculus

Total cells

62,998,417

5,684,805

Unique cells

36,227,903

4,128,230

Cell metadata

Category

Homo sapiens

Mus musculus

Assay

20

10

Cell type

631

248

Development stage

173

36

Disease

72

5

Self-reported ethnicity

30

NA

Sex

3

3

Suspension type

2

2

Tissue

230

74

Tissue general

53

27

Embeddings

Find out more in the Census model page.

Available embeddings can be accessed via cellxgene_census.experimental.get_embedding(), or by specifying the obs_embeddings/var_embeddings field in cellxgene_census.get_anndata().

Cells

Method

Homo sapiens

Mus musculus

scVI

scvi

scvi

Fine-tuned Geneformer

geneformer

NA

scGPT

scgpt

NA

Universal Cell Embeddings

uce

NA

NMF

nmf

NA

Features

Method

Homo sapiens

Mus musculus

NMF

nmf

NA

LTS 2023-07-25

Open this data release by specifying census_version = "2023-07-25" in future calls to open_soma().

Version information

Information

Value

Census schema version

1.0.0

Census build date

2023-07-25

Dataset schema version

3.0.0

Number of datasets

593

Cell and donor counts

Type

Homo sapiens

Mus musculus

Total cells

56,400,873

5,255,245

Unique cells

33,364,242

4,083,531

Cell metadata

Category

Homo sapiens

Mus musculus

Assay

19

9

Cell type

613

248

Development stage

164

33

Disease

64

5

Self-reported ethnicity

26

NA

Sex

3

3

Suspension type

2

2

Tissue

220

66

Tissue general

54

27

LTS 2023-05-15

Open this data release by specifying census_version = "2023-05-15" in future calls to open_soma().

πŸ”΄ Errata πŸ”΄οƒ

Duplicate observations with is_primary_data = True

In order to prevent duplicate data in analyses, each observation (cell) should be marked is_primary data = True exactly once in the Census. Since this LTS release, 243,569 observations have been identified that are represented at least twice with is_primary_data = True.

This issue will be corrected in the following LTS data release, by identifying and marking only one cell out of the duplicates as is_primary_data = True.

If you wish to use this data release, you can consider filtering out all of these 243,569 cells by using the soma_joinids provided in this file duplicate_cells_census_LTS_2023-05-15.csv.zip. You can filter specific cells by using the value_filter or obs_value_filter of the querying API functions, for more information follow this tutorial.

Version information

Information

Value

Census schema version

1.0.0

Census build date

2023-05-15

Dataset schema version

3.0.0

Number of datasets

562

Cell and donor counts

Type

Homo sapiens

Mus musculus

Total cells

53,794,728

4,086,032

Unique cells

33,758,887

2,914,318

Cell metadata

Category

Homo sapiens

Mus musculus

Assay

20

9

Cell type

604

226

Development stage

164

30

Disease

68

5

Self-reported ethnicity

26

NA

Sex

3

3

Suspension type

2

2

Tissue

227

51

Tissue general

61

27

Compatibility with package versions

Due to the nature of the Census storage backend, the format version will change from time to time. Format upgrades are always backwards compatible, but they’re not always forwards compatible, which means that reading a recent Census data version using an older version of the package might result in an error. We aim to guarantee the following policy:

  • Every Census package version released after an LTS will be able to read every Census data release until the next LTS.

The current LTS release (2025-11-08) is compatible with the following package versions:

  • 1.17.x