Exploring pre-calculated summary cell counts๏
This tutorial describes how to access pre-calculated summary cell counts. Each Census contains a top-level dataframe summarizing counts of various cell labels, this is the census_summary_cell_counts
dataframe . You can read this into a Pandas DataFrame
Contents
Fetching the
census_summary_cell_counts
dataframe.Creating summary counts beyond pre-calculated values.
โ ๏ธ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable is_primary_data
which is described in the Census schema.
Fetching the census_summary_cell_counts
dataframe๏
[1]:
import cellxgene_census
census = cellxgene_census.open_soma()
census_summary_cell_counts = census["census_info"]["summary_cell_counts"].read().concat().to_pandas()
# Dropping the soma_joinid column as it isn't useful in this demo
census_summary_cell_counts = census_summary_cell_counts.drop(columns=["soma_joinid"])
census_summary_cell_counts
The "stable" release is currently 2023-07-25. Specify 'census_version="2023-07-25"' in future calls to open_soma() to ensure data consistency.
[1]:
organism | category | ontology_term_id | unique_cell_count | total_cell_count | label | |
---|---|---|---|---|---|---|
0 | Homo sapiens | all | na | 33364242 | 56400873 | na |
1 | Homo sapiens | assay | EFO:0008722 | 264166 | 279635 | Drop-seq |
2 | Homo sapiens | assay | EFO:0008780 | 25652 | 51304 | inDrop |
3 | Homo sapiens | assay | EFO:0008919 | 89477 | 206754 | Seq-Well |
4 | Homo sapiens | assay | EFO:0008931 | 78750 | 188248 | Smart-seq2 |
... | ... | ... | ... | ... | ... | ... |
1357 | Mus musculus | tissue_general | UBERON:0002113 | 179684 | 208324 | kidney |
1358 | Mus musculus | tissue_general | UBERON:0002365 | 15577 | 31154 | exocrine gland |
1359 | Mus musculus | tissue_general | UBERON:0002367 | 37715 | 130135 | prostate gland |
1360 | Mus musculus | tissue_general | UBERON:0002368 | 13322 | 26644 | endocrine gland |
1361 | Mus musculus | tissue_general | UBERON:0002371 | 90225 | 144962 | bone marrow |
1362 rows ร 6 columns
Creating summary counts beyond pre-calculated values.๏
The dataframe above is precomputed from the experiments in the Census, providing a quick overview of the Census contents.
You can do similar group statistics using Pandas groupby
functions.
The code below reproduces the above counts using full obs
dataframe in the Homo_sapiens
experiment.
Keep in mind that the Census is very large, and any queries will return significant amount of data. You can manage that by narrowing the query request using column_names
and value_filter
in your query.
[2]:
human = census["census_data"]["homo_sapiens"]
obs_df = human.obs.read(column_names=["cell_type_ontology_term_id", "cell_type"]).concat().to_pandas()
obs_df.groupby(by=["cell_type_ontology_term_id", "cell_type"], as_index=False, observed=True).size()
[2]:
cell_type_ontology_term_id | cell_type | size | |
---|---|---|---|
0 | CL:0000001 | primary cultured cell | 80 |
1 | CL:0000003 | native cell | 1308000 |
2 | CL:0000006 | neuronal receptor cell | 2502 |
3 | CL:0000015 | male germ cell | 621 |
4 | CL:0000019 | sperm | 22 |
... | ... | ... | ... |
608 | CL:4028006 | alveolar type 2 fibroblast cell | 38250 |
609 | CL:4030009 | epithelial cell of proximal tubule segment 1 | 777 |
610 | CL:4030011 | epithelial cell of proximal tubule segment 3 | 989 |
611 | CL:4030018 | kidney connecting tubule principal cell | 107 |
612 | CL:4030023 | respiratory hillock cell | 10170 |
613 rows ร 3 columns
Close the census when complete.
[3]:
census.close()