Exploring pre-calculated summary cell counts

This tutorial describes how to access pre-calculated summary cell counts. Each Census contains a top-level dataframe summarizing counts of various cell labels, this is the census_summary_cell_counts dataframe . You can read this into a Pandas DataFrame

Contents

Fetching the census_summary_cell_counts dataframe.
Creating summary counts beyond pre-calculated values.

⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable is_primary_data which is described in the Census schema.

Fetching the `census_summary_cell_counts` dataframe

[1]:

import cellxgene_census

census = cellxgene_census.open_soma()
census_summary_cell_counts = census["census_info"]["summary_cell_counts"].read().concat().to_pandas()

# Dropping the soma_joinid column as it isn't useful in this demo
census_summary_cell_counts = census_summary_cell_counts.drop(columns=["soma_joinid"])

census_summary_cell_counts

The "stable" release is currently 2023-07-25. Specify 'census_version="2023-07-25"' in future calls to open_soma() to ensure data consistency.

[1]:

	organism	category	ontology_term_id	unique_cell_count	total_cell_count	label
0	Homo sapiens	all	na	33364242	56400873	na
1	Homo sapiens	assay	EFO:0008722	264166	279635	Drop-seq
2	Homo sapiens	assay	EFO:0008780	25652	51304	inDrop
3	Homo sapiens	assay	EFO:0008919	89477	206754	Seq-Well
4	Homo sapiens	assay	EFO:0008931	78750	188248	Smart-seq2
...	...	...	...	...	...	...
1357	Mus musculus	tissue_general	UBERON:0002113	179684	208324	kidney
1358	Mus musculus	tissue_general	UBERON:0002365	15577	31154	exocrine gland
1359	Mus musculus	tissue_general	UBERON:0002367	37715	130135	prostate gland
1360	Mus musculus	tissue_general	UBERON:0002368	13322	26644	endocrine gland
1361	Mus musculus	tissue_general	UBERON:0002371	90225	144962	bone marrow

1362 rows × 6 columns

Creating summary counts beyond pre-calculated values.

The dataframe above is precomputed from the experiments in the Census, providing a quick overview of the Census contents.

You can do similar group statistics using Pandas groupby functions.

The code below reproduces the above counts using full obs dataframe in the Homo_sapiens experiment.

Keep in mind that the Census is very large, and any queries will return significant amount of data. You can manage that by narrowing the query request using column_names and value_filter in your query.

[2]:

human = census["census_data"]["homo_sapiens"]
obs_df = human.obs.read(column_names=["cell_type_ontology_term_id", "cell_type"]).concat().to_pandas()
obs_df.groupby(by=["cell_type_ontology_term_id", "cell_type"], as_index=False, observed=True).size()

[2]:

	cell_type_ontology_term_id	cell_type	size
0	CL:0000001	primary cultured cell	80
1	CL:0000003	native cell	1308000
2	CL:0000006	neuronal receptor cell	2502
3	CL:0000015	male germ cell	621
4	CL:0000019	sperm	22
...	...	...	...
608	CL:4028006	alveolar type 2 fibroblast cell	38250
609	CL:4030009	epithelial cell of proximal tubule segment 1	777
610	CL:4030011	epithelial cell of proximal tubule segment 3	989
611	CL:4030018	kidney connecting tubule principal cell	107
612	CL:4030023	respiratory hillock cell	10170

613 rows × 3 columns

Close the census when complete.

[3]:

census.close()

Exploring pre-calculated summary cell counts

Fetching the census_summary_cell_counts dataframe

Creating summary counts beyond pre-calculated values.

Fetching the `census_summary_cell_counts` dataframe