{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring pre-calculated summary cell counts\n", "\n", "This tutorial describes how to access pre-calculated summary cell counts. Each Census contains a top-level dataframe summarizing counts of various cell labels, this is the `census_summary_cell_counts` dataframe . You can read this into a Pandas DataFrame\n", "\n", "**Contents**\n", "\n", "1. Fetching the `census_summary_cell_counts` dataframe.\n", "2. Creating summary counts beyond pre-calculated values.\n", "\n", "⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable `is_primary_data` which is described in the [Census schema](https://github.com/chanzuckerberg/cellxgene-census/blob/main/docs/cellxgene_census_schema.md#repeated-data).\n", "\n", "## Fetching the `census_summary_cell_counts` dataframe" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T16:17:28.143432Z", "iopub.status.busy": "2023-07-28T16:17:28.143007Z", "iopub.status.idle": "2023-07-28T16:17:31.207795Z", "shell.execute_reply": "2023-07-28T16:17:31.207159Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The \"stable\" release is currently 2023-07-25. Specify 'census_version=\"2023-07-25\"' in future calls to open_soma() to ensure data consistency.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
organismcategoryontology_term_idunique_cell_counttotal_cell_countlabel
0Homo sapiensallna3336424256400873na
1Homo sapiensassayEFO:0008722264166279635Drop-seq
2Homo sapiensassayEFO:00087802565251304inDrop
3Homo sapiensassayEFO:000891989477206754Seq-Well
4Homo sapiensassayEFO:000893178750188248Smart-seq2
.....................
1357Mus musculustissue_generalUBERON:0002113179684208324kidney
1358Mus musculustissue_generalUBERON:00023651557731154exocrine gland
1359Mus musculustissue_generalUBERON:000236737715130135prostate gland
1360Mus musculustissue_generalUBERON:00023681332226644endocrine gland
1361Mus musculustissue_generalUBERON:000237190225144962bone marrow
\n", "

1362 rows × 6 columns

\n", "
" ], "text/plain": [ " organism category ontology_term_id unique_cell_count \\\n", "0 Homo sapiens all na 33364242 \n", "1 Homo sapiens assay EFO:0008722 264166 \n", "2 Homo sapiens assay EFO:0008780 25652 \n", "3 Homo sapiens assay EFO:0008919 89477 \n", "4 Homo sapiens assay EFO:0008931 78750 \n", "... ... ... ... ... \n", "1357 Mus musculus tissue_general UBERON:0002113 179684 \n", "1358 Mus musculus tissue_general UBERON:0002365 15577 \n", "1359 Mus musculus tissue_general UBERON:0002367 37715 \n", "1360 Mus musculus tissue_general UBERON:0002368 13322 \n", "1361 Mus musculus tissue_general UBERON:0002371 90225 \n", "\n", " total_cell_count label \n", "0 56400873 na \n", "1 279635 Drop-seq \n", "2 51304 inDrop \n", "3 206754 Seq-Well \n", "4 188248 Smart-seq2 \n", "... ... ... \n", "1357 208324 kidney \n", "1358 31154 exocrine gland \n", "1359 130135 prostate gland \n", "1360 26644 endocrine gland \n", "1361 144962 bone marrow \n", "\n", "[1362 rows x 6 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import cellxgene_census\n", "\n", "census = cellxgene_census.open_soma()\n", "census_summary_cell_counts = census[\"census_info\"][\"summary_cell_counts\"].read().concat().to_pandas()\n", "\n", "# Dropping the soma_joinid column as it isn't useful in this demo\n", "census_summary_cell_counts = census_summary_cell_counts.drop(columns=[\"soma_joinid\"])\n", "\n", "census_summary_cell_counts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating summary counts beyond pre-calculated values.\n", "\n", "The dataframe above is precomputed from the experiments in the Census, providing a quick overview of the Census contents.\n", "\n", "You can do similar group statistics using Pandas `groupby` functions. \n", "\n", "The code below reproduces the above counts using full `obs` dataframe in the `Homo_sapiens` experiment.\n", "\n", "Keep in mind that the Census is very large, and any queries will return significant amount of data. You can manage that by narrowing the query request using `column_names` and `value_filter` in your query." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T16:17:31.210438Z", "iopub.status.busy": "2023-07-28T16:17:31.210021Z", "iopub.status.idle": "2023-07-28T16:17:43.764065Z", "shell.execute_reply": "2023-07-28T16:17:43.763547Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cell_type_ontology_term_idcell_typesize
0CL:0000001primary cultured cell80
1CL:0000003native cell1308000
2CL:0000006neuronal receptor cell2502
3CL:0000015male germ cell621
4CL:0000019sperm22
............
608CL:4028006alveolar type 2 fibroblast cell38250
609CL:4030009epithelial cell of proximal tubule segment 1777
610CL:4030011epithelial cell of proximal tubule segment 3989
611CL:4030018kidney connecting tubule principal cell107
612CL:4030023respiratory hillock cell10170
\n", "

613 rows × 3 columns

\n", "
" ], "text/plain": [ " cell_type_ontology_term_id cell_type \\\n", "0 CL:0000001 primary cultured cell \n", "1 CL:0000003 native cell \n", "2 CL:0000006 neuronal receptor cell \n", "3 CL:0000015 male germ cell \n", "4 CL:0000019 sperm \n", ".. ... ... \n", "608 CL:4028006 alveolar type 2 fibroblast cell \n", "609 CL:4030009 epithelial cell of proximal tubule segment 1 \n", "610 CL:4030011 epithelial cell of proximal tubule segment 3 \n", "611 CL:4030018 kidney connecting tubule principal cell \n", "612 CL:4030023 respiratory hillock cell \n", "\n", " size \n", "0 80 \n", "1 1308000 \n", "2 2502 \n", "3 621 \n", "4 22 \n", ".. ... \n", "608 38250 \n", "609 777 \n", "610 989 \n", "611 107 \n", "612 10170 \n", "\n", "[613 rows x 3 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "human = census[\"census_data\"][\"homo_sapiens\"]\n", "obs_df = human.obs.read(column_names=[\"cell_type_ontology_term_id\", \"cell_type\"]).concat().to_pandas()\n", "obs_df.groupby(by=[\"cell_type_ontology_term_id\", \"cell_type\"], as_index=False, observed=True).size()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Close the census when complete. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T16:17:43.766821Z", "iopub.status.busy": "2023-07-28T16:17:43.766325Z", "iopub.status.idle": "2023-07-28T16:17:43.769229Z", "shell.execute_reply": "2023-07-28T16:17:43.768748Z" } }, "outputs": [], "source": [ "census.close()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.10" }, "vscode": { "interpreter": { "hash": "3da8ec1c162cd849e59e6ea2824b2e353dce799884e910aae99411be5277f953" } } }, "nbformat": 4, "nbformat_minor": 2 }