{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Genes measured in each cell (dataset presence matrix)\n", "\n", "The Census is a compilation of cells from multiple datasets that may differ by the sets of genes they measure. This notebook describes the way to identify the genes measured per dataset.\n", "\n", "The presence matrix is a sparse boolean array, indicating which features (var) were present in each dataset. The array has dimensions [n_datasets, n_var], and is stored in the SOMA Measurement `varp` collection. The first dimension is indexed by the `soma_joinid` in the `census_datasets` dataframe. The second is indexed by the `soma_joinid` in the `var` dataframe of the measurement.\n", "\n", "As a reminder the `obs` data frame has a column `dataset_id` that can be used to link any cell in the Census to the presence matrix.\n", "\n", "**Contents** \n", "\n", "1. Opening the Census.\n", "2. Fetching the IDs of the Census datasets.\n", "3. Fetching the dataset presence matrix.\n", "4. Identifying genes measured in a specific dataset.\n", "5. Identifying datasets that measured specific genes.\n", "6. Identifying all genes measured in a dataset.\n", "\n", "⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable `is_primary_data` which is described in the [Census schema](https://github.com/chanzuckerberg/cellxgene-census/blob/main/docs/cellxgene_census_schema.md#repeated-data).\n", "\n", "## Opening the Census\n", "\n", "The `cellxgene_census` python package contains a convenient API to open the latest version of the Census." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T14:31:42.113710Z", "iopub.status.busy": "2023-07-28T14:31:42.113383Z", "iopub.status.idle": "2023-07-28T14:31:44.521274Z", "shell.execute_reply": "2023-07-28T14:31:44.520660Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The \"stable\" release is currently 2023-07-25. Specify 'census_version=\"2023-07-25\"' in future calls to open_soma() to ensure data consistency.\n" ] } ], "source": [ "import cellxgene_census\n", "\n", "census = cellxgene_census.open_soma()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fetching the IDs of the Census datasets\n", "\n", "Let's grab a table of all the datasets included in the Census and use this table in combination with the presence matrix below." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T14:31:44.524333Z", "iopub.status.busy": "2023-07-28T14:31:44.523881Z", "iopub.status.idle": "2023-07-28T14:31:45.361846Z", "shell.execute_reply": "2023-07-28T14:31:45.361322Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
soma_joinidcollection_idcollection_namecollection_doidataset_iddataset_titledataset_h5ad_pathdataset_total_cell_count
00e2c257e7-6f79-487c-b81c-39451cd4ab3cSpatial multiomics map of trophoblast developm...10.1038/s41586-023-05869-0f171db61-e57e-4535-a06a-35d8b6ef8f2bdonor_p13_trophoblastsf171db61-e57e-4535-a06a-35d8b6ef8f2b.h5ad31497
11e2c257e7-6f79-487c-b81c-39451cd4ab3cSpatial multiomics map of trophoblast developm...10.1038/s41586-023-05869-0ecf2e08e-2032-4a9e-b466-b65b395f4a02All donors trophoblastsecf2e08e-2032-4a9e-b466-b65b395f4a02.h5ad67070
22e2c257e7-6f79-487c-b81c-39451cd4ab3cSpatial multiomics map of trophoblast developm...10.1038/s41586-023-05869-074cff64f-9da9-4b2a-9b3b-8a04a1598040All donors all cell states (in vivo)74cff64f-9da9-4b2a-9b3b-8a04a1598040.h5ad286326
33f7cecffa-00b4-4560-a29a-8ad626b8ee08Mapping single-cell transcriptomes in the intr...10.1016/j.ccell.2022.11.0015af90777-6760-4003-9dba-8f945fec6fdfSingle-cell transcriptomic datasets of Renal c...5af90777-6760-4003-9dba-8f945fec6fdf.h5ad270855
443f50314f-bdc9-40c6-8e4a-b0901ebfbe4cSingle-cell sequencing links multiregional imm...10.1016/j.ccell.2021.03.007bd65a70f-b274-4133-b9dd-0d1431b6af34Single-cell sequencing links multiregional imm...bd65a70f-b274-4133-b9dd-0d1431b6af34.h5ad167283
...........................
588588180bff9c-c8a5-4539-b13b-ddbc00d643e6Molecular characterization of selectively vuln...10.1038/s41593-020-00764-7f9ad5649-f372-43e1-a3a8-423383e5a8a2Molecular characterization of selectively vuln...f9ad5649-f372-43e1-a3a8-423383e5a8a2.h5ad8168
589589a72afd53-ab92-4511-88da-252fb0e26b9aSingle-cell atlas of peripheral immune respons...10.1038/s41591-020-0944-y456e8b9b-f872-488b-871d-94534090a865Single-cell atlas of peripheral immune respons...456e8b9b-f872-488b-871d-94534090a865.h5ad44721
59059038833785-fac5-48fd-944a-0f62a4c23ed1Construction of a human cell landscape at sing...10.1038/s41586-020-2157-42adb1f8a-a6b1-4909-8ee8-484814e2d4bfConstruction of a human cell landscape at sing...2adb1f8a-a6b1-4909-8ee8-484814e2d4bf.h5ad598266
5915915d445965-6f1a-4b68-ba3a-b8f765155d3aA molecular cell atlas of the human lung from ...10.1038/s41586-020-2922-4e04daea4-4412-45b5-989e-76a9be070a89Krasnow Lab Human Lung Cell Atlas, Smart-seq2e04daea4-4412-45b5-989e-76a9be070a89.h5ad9409
5925925d445965-6f1a-4b68-ba3a-b8f765155d3aA molecular cell atlas of the human lung from ...10.1038/s41586-020-2922-48c42cfd0-0b0a-46d5-910c-fc833d83c45eKrasnow Lab Human Lung Cell Atlas, 10X8c42cfd0-0b0a-46d5-910c-fc833d83c45e.h5ad65662
\n", "

593 rows × 8 columns

\n", "
" ], "text/plain": [ " soma_joinid collection_id \\\n", "0 0 e2c257e7-6f79-487c-b81c-39451cd4ab3c \n", "1 1 e2c257e7-6f79-487c-b81c-39451cd4ab3c \n", "2 2 e2c257e7-6f79-487c-b81c-39451cd4ab3c \n", "3 3 f7cecffa-00b4-4560-a29a-8ad626b8ee08 \n", "4 4 3f50314f-bdc9-40c6-8e4a-b0901ebfbe4c \n", ".. ... ... \n", "588 588 180bff9c-c8a5-4539-b13b-ddbc00d643e6 \n", "589 589 a72afd53-ab92-4511-88da-252fb0e26b9a \n", "590 590 38833785-fac5-48fd-944a-0f62a4c23ed1 \n", "591 591 5d445965-6f1a-4b68-ba3a-b8f765155d3a \n", "592 592 5d445965-6f1a-4b68-ba3a-b8f765155d3a \n", "\n", " collection_name \\\n", "0 Spatial multiomics map of trophoblast developm... \n", "1 Spatial multiomics map of trophoblast developm... \n", "2 Spatial multiomics map of trophoblast developm... \n", "3 Mapping single-cell transcriptomes in the intr... \n", "4 Single-cell sequencing links multiregional imm... \n", ".. ... \n", "588 Molecular characterization of selectively vuln... \n", "589 Single-cell atlas of peripheral immune respons... \n", "590 Construction of a human cell landscape at sing... \n", "591 A molecular cell atlas of the human lung from ... \n", "592 A molecular cell atlas of the human lung from ... \n", "\n", " collection_doi dataset_id \\\n", "0 10.1038/s41586-023-05869-0 f171db61-e57e-4535-a06a-35d8b6ef8f2b \n", "1 10.1038/s41586-023-05869-0 ecf2e08e-2032-4a9e-b466-b65b395f4a02 \n", "2 10.1038/s41586-023-05869-0 74cff64f-9da9-4b2a-9b3b-8a04a1598040 \n", "3 10.1016/j.ccell.2022.11.001 5af90777-6760-4003-9dba-8f945fec6fdf \n", "4 10.1016/j.ccell.2021.03.007 bd65a70f-b274-4133-b9dd-0d1431b6af34 \n", ".. ... ... \n", "588 10.1038/s41593-020-00764-7 f9ad5649-f372-43e1-a3a8-423383e5a8a2 \n", "589 10.1038/s41591-020-0944-y 456e8b9b-f872-488b-871d-94534090a865 \n", "590 10.1038/s41586-020-2157-4 2adb1f8a-a6b1-4909-8ee8-484814e2d4bf \n", "591 10.1038/s41586-020-2922-4 e04daea4-4412-45b5-989e-76a9be070a89 \n", "592 10.1038/s41586-020-2922-4 8c42cfd0-0b0a-46d5-910c-fc833d83c45e \n", "\n", " dataset_title \\\n", "0 donor_p13_trophoblasts \n", "1 All donors trophoblasts \n", "2 All donors all cell states (in vivo) \n", "3 Single-cell transcriptomic datasets of Renal c... \n", "4 Single-cell sequencing links multiregional imm... \n", ".. ... \n", "588 Molecular characterization of selectively vuln... \n", "589 Single-cell atlas of peripheral immune respons... \n", "590 Construction of a human cell landscape at sing... \n", "591 Krasnow Lab Human Lung Cell Atlas, Smart-seq2 \n", "592 Krasnow Lab Human Lung Cell Atlas, 10X \n", "\n", " dataset_h5ad_path dataset_total_cell_count \n", "0 f171db61-e57e-4535-a06a-35d8b6ef8f2b.h5ad 31497 \n", "1 ecf2e08e-2032-4a9e-b466-b65b395f4a02.h5ad 67070 \n", "2 74cff64f-9da9-4b2a-9b3b-8a04a1598040.h5ad 286326 \n", "3 5af90777-6760-4003-9dba-8f945fec6fdf.h5ad 270855 \n", "4 bd65a70f-b274-4133-b9dd-0d1431b6af34.h5ad 167283 \n", ".. ... ... \n", "588 f9ad5649-f372-43e1-a3a8-423383e5a8a2.h5ad 8168 \n", "589 456e8b9b-f872-488b-871d-94534090a865.h5ad 44721 \n", "590 2adb1f8a-a6b1-4909-8ee8-484814e2d4bf.h5ad 598266 \n", "591 e04daea4-4412-45b5-989e-76a9be070a89.h5ad 9409 \n", "592 8c42cfd0-0b0a-46d5-910c-fc833d83c45e.h5ad 65662 \n", "\n", "[593 rows x 8 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Grab the experiment containing human data, and the measurement therein with RNA\n", "human = census[\"census_data\"][\"homo_sapiens\"]\n", "human_rna = human.ms[\"RNA\"]\n", "\n", "# The census-wide datasets\n", "datasets_df = census[\"census_info\"][\"datasets\"].read().concat().to_pandas()\n", "\n", "datasets_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fetching the dataset presence matrix\n", "\n", "Now let's fetch the dataset presence matrix. \n", "\n", "For convenience, read the entire presence matrix (for Homo sapiens) into a SciPy array. There is a convenience API providing this capability, returning the matrix in a `scipy.sparse.array`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T14:31:45.364362Z", "iopub.status.busy": "2023-07-28T14:31:45.364104Z", "iopub.status.idle": "2023-07-28T14:31:46.884746Z", "shell.execute_reply": "2023-07-28T14:31:46.884057Z" } }, "outputs": [ { "data": { "text/plain": [ "<593x60664 sparse matrix of type ''\n", "\twith 16133717 stored elements in Compressed Sparse Row format>" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "presence_matrix = cellxgene_census.get_presence_matrix(census, organism=\"Homo sapiens\", measurement_name=\"RNA\")\n", "\n", "presence_matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also need the `var` dataframe, which is read into a Pandas DataFrame for convenient manipulation:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T14:31:46.887589Z", "iopub.status.busy": "2023-07-28T14:31:46.887321Z", "iopub.status.idle": "2023-07-28T14:31:47.356841Z", "shell.execute_reply": "2023-07-28T14:31:47.356313Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
soma_joinidfeature_idfeature_namefeature_length
00ENSG00000121410A1BG3999
11ENSG00000268895A1BG-AS13374
22ENSG00000148584A1CF9603
33ENSG00000175899A2M6318
44ENSG00000245105A2M-AS12948
...............
6065960659ENSG00000288719RP4-669P10.214252
6066060660ENSG00000288720RP11-852E15.37007
6066160661ENSG00000288721RP5-973N23.57765
6066260662ENSG00000288723RP11-553N16.61015
6066360663ENSG00000288724RP13-546I2.2625
\n", "

60664 rows × 4 columns

\n", "
" ], "text/plain": [ " soma_joinid feature_id feature_name feature_length\n", "0 0 ENSG00000121410 A1BG 3999\n", "1 1 ENSG00000268895 A1BG-AS1 3374\n", "2 2 ENSG00000148584 A1CF 9603\n", "3 3 ENSG00000175899 A2M 6318\n", "4 4 ENSG00000245105 A2M-AS1 2948\n", "... ... ... ... ...\n", "60659 60659 ENSG00000288719 RP4-669P10.21 4252\n", "60660 60660 ENSG00000288720 RP11-852E15.3 7007\n", "60661 60661 ENSG00000288721 RP5-973N23.5 7765\n", "60662 60662 ENSG00000288723 RP11-553N16.6 1015\n", "60663 60663 ENSG00000288724 RP13-546I2.2 625\n", "\n", "[60664 rows x 4 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var_df = human_rna.var.read().concat().to_pandas()\n", "\n", "var_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Identifying genes measured in a specific dataset.\n", "\n", "Now that we have the dataset table, the genes metadata table, and the dataset presence matrix, we can check if a gene or set of genes were measured in a specific dataset.\n", "\n", "**Important:** the presence matrix is indexed by soma_joinid, and is *NOT* positionally indexed. In other words:\n", "\n", "* the first dimension of the presence matrix is the dataset's `soma_joinid`, as stored in the `census_datasets` dataframe.\n", "* the second dimension of the presence matrix is the feature's `soma_joinid`, as stored in the `var` dataframe.\n", "\n", "Let's find out if the the gene `\"ENSG00000286096\"` was measured in the dataset with id `\"97a17473-e2b1-4f31-a544-44a60773e2dd\"`.\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T14:31:47.359369Z", "iopub.status.busy": "2023-07-28T14:31:47.359100Z", "iopub.status.idle": "2023-07-28T14:31:47.368533Z", "shell.execute_reply": "2023-07-28T14:31:47.368030Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Feature is present.\n" ] } ], "source": [ "var_joinid = var_df.loc[var_df.feature_id == \"ENSG00000286096\"].soma_joinid\n", "dataset_joinid = datasets_df.loc[datasets_df.dataset_id == \"97a17473-e2b1-4f31-a544-44a60773e2dd\"].soma_joinid\n", "is_present = presence_matrix[dataset_joinid, var_joinid][0, 0]\n", "print(f'Feature is {\"present\" if is_present else \"not present\"}.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Identifying datasets that measured specific genes\n", "\n", "Similarly, we can determine the datasets that measured a specific gene or set of genes." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T14:31:47.370759Z", "iopub.status.busy": "2023-07-28T14:31:47.370521Z", "iopub.status.idle": "2023-07-28T14:31:47.417709Z", "shell.execute_reply": "2023-07-28T14:31:47.417245Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
soma_joinidcollection_idcollection_namecollection_doidataset_iddataset_titledataset_h5ad_pathdataset_total_cell_count
55e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl4896ff45e623-7f5f-46e3-b47d-56be0341f66bTabula Sapiens - Pancreasff45e623-7f5f-46e3-b47d-56be0341f66b.h5ad13497
66e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl4896f01bdd17-4902-40f5-86e3-240d66dd2587Tabula Sapiens - Salivary_Glandf01bdd17-4902-40f5-86e3-240d66dd2587.h5ad27199
77e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl4896e6a11140-2545-46bc-929e-da243eed2caeTabula Sapiens - Hearte6a11140-2545-46bc-929e-da243eed2cae.h5ad11505
88e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl4896e5c63d94-593c-4338-a489-e1048599e751Tabula Sapiens - Bladdere5c63d94-593c-4338-a489-e1048599e751.h5ad24583
99e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl4896d8732da6-8d1d-42d9-b625-f2416c30054bTabula Sapiens - Trachead8732da6-8d1d-42d9-b625-f2416c30054b.h5ad9522
1111e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl4896cee11228-9f0b-4e57-afe2-cfe15ee56312Tabula Sapiens - Spleencee11228-9f0b-4e57-afe2-cfe15ee56312.h5ad34004
1212e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl4896a357414d-2042-4eb5-95f0-c58604a18bddTabula Sapiens - Small_Intestinea357414d-2042-4eb5-95f0-c58604a18bdd.h5ad12467
1414e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl4896a0754256-f44b-4c4a-962c-a552e47d3fdcTabula Sapiens - Eyea0754256-f44b-4c4a-962c-a552e47d3fdc.h5ad10650
1515e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl4896983d5ec9-40e8-4512-9e65-a572a9c486cbTabula Sapiens - Blood983d5ec9-40e8-4512-9e65-a572a9c486cb.h5ad50115
1919e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl48965e5e7a2f-8f1c-42ac-90dc-b4f80f38e84cTabula Sapiens - Fat5e5e7a2f-8f1c-42ac-90dc-b4f80f38e84c.h5ad20263
2020e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl489655cf0ea3-9d2b-4294-871e-bb4b49a79fc7Tabula Sapiens - Tongue55cf0ea3-9d2b-4294-871e-bb4b49a79fc7.h5ad15020
2121e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl48964f1555bc-4664-46c3-a606-78d34dd10d92Tabula Sapiens - Bone_Marrow4f1555bc-4664-46c3-a606-78d34dd10d92.h5ad12297
2323e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl48962423ce2c-3149-4cca-a2ff-cf682ea29b5fTabula Sapiens - Kidney2423ce2c-3149-4cca-a2ff-cf682ea29b5f.h5ad9641
2424e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl48961c9eb291-6d31-47e1-96b2-129b5e1ae64fTabula Sapiens - Muscle1c9eb291-6d31-47e1-96b2-129b5e1ae64f.h5ad30746
2525e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl489618eb630b-a754-4111-8cd4-c24ec80aa5ecTabula Sapiens - Lymph_Node18eb630b-a754-4111-8cd4-c24ec80aa5ec.h5ad53275
2626e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl48960d2ee4ac-05ee-40b2-afb6-ebb584caa867Tabula Sapiens - Lung0d2ee4ac-05ee-40b2-afb6-ebb584caa867.h5ad35682
2727e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl48960ced5e76-6040-47ff-8a72-93847965afc0Tabula Sapiens - Thymus0ced5e76-6040-47ff-8a72-93847965afc0.h5ad33664
4343283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.5118988e10f1c4-8e98-41e5-b65f-8cd89a887122All neurons8e10f1c4-8e98-41e5-b65f-8cd89a887122.h5ad2480956
139139283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.511898fe1a73ab-a203-45fd-84e9-0f7fd19efcbdDissection: Amygdaloid complex (AMY) - basolat...fe1a73ab-a203-45fd-84e9-0f7fd19efcbd.h5ad35285
143143283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.511898f8dda921-5fb4-4c94-a654-c6fc346bfd6dDissection: Cerebral cortex (Cx) - Occipitotem...f8dda921-5fb4-4c94-a654-c6fc346bfd6d.h5ad31899
160160283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.511898dd03ce70-3243-4c96-9561-330cc461e4d7Dissection: Cerebral cortex (Cx) - Perirhinal ...dd03ce70-3243-4c96-9561-330cc461e4d7.h5ad23732
165165283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.511898d2b5efc1-14c6-4b5f-bd98-40f9084872d7Dissection: Tail of Hippocampus (HiT) - Caudal...d2b5efc1-14c6-4b5f-bd98-40f9084872d7.h5ad36886
175175283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.511898c4b03352-af8d-492a-8d6b-40f304e0a122Supercluster: Medium spiny neuronc4b03352-af8d-492a-8d6b-40f304e0a122.h5ad152189
176176283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.511898c2aad8fc-b63b-4f9b-9cfd-baf7bc9c1771Dissection: Cerebral cortex (Cx) - Temporal po...c2aad8fc-b63b-4f9b-9cfd-baf7bc9c1771.h5ad37642
177177283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.511898c202b243-1aa1-4b16-bc9a-b36241f3b1e3Supercluster: Amygdala excitatoryc202b243-1aa1-4b16-bc9a-b36241f3b1e3.h5ad109452
178178283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.511898bdb26abd-f4ba-4ea3-8862-c2340e7a4f55Supercluster: CGE interneuronbdb26abd-f4ba-4ea3-8862-c2340e7a4f55.h5ad227671
183183283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.511898acae7679-d077-461c-b857-ee6ccfeb267fDissection: Head of hippocampus (HiH) - CA1acae7679-d077-461c-b857-ee6ccfeb267f.h5ad39147
196196283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.5118989372df2d-13d6-4fac-980b-919a5b7eb483Dissection: Midbrain (M) - Periaqueductal gray...9372df2d-13d6-4fac-980b-919a5b7eb483.h5ad33794
197197283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.51189893131426-0124-4ab4-a013-9dfbcd99d467Dissection: Epithalamus - ETH93131426-0124-4ab4-a013-9dfbcd99d467.h5ad24327
206206283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.5118987c1c3d47-3166-43e5-9a95-65ceb2d45f78Dissection: Pons (Pn) - Pontine reticular form...7c1c3d47-3166-43e5-9a95-65ceb2d45f78.h5ad49512
208208283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.5118987a0a8891-9a22-4549-a55b-c2aca23c3a2aSupercluster: Hippocampal CA1-37a0a8891-9a22-4549-a55b-c2aca23c3a2a.h5ad74979
220220283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.5118985e5ab909-f73f-4b57-98a0-6d2c5662f6a4Dissection: Midbrain (M) - Inferior colliculus...5e5ab909-f73f-4b57-98a0-6d2c5662f6a4.h5ad32306
243243283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.5118983f56901c-dd4a-47d6-b60b-7b0c0111cfb2Dissection: Head of hippocampus (HiH) - CA1-33f56901c-dd4a-47d6-b60b-7b0c0111cfb2.h5ad37911
245245283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.5118983a7f3ab4-a280-4b3b-b2c0-6dd05614a78cSupercluster: Splatter3a7f3ab4-a280-4b3b-b2c0-6dd05614a78c.h5ad291833
249249283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.51189835c8a04c-8639-4d15-8228-765d8d93fc96Dissection: Hypothalamus (HTH) - supraoptic re...35c8a04c-8639-4d15-8228-765d8d93fc96.h5ad16753
270270283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.51189807b1d7c8-5c2e-42f7-9246-26f746cd6013Dissection: Myelencephalon (medulla oblongata)...07b1d7c8-5c2e-42f7-9246-26f746cd6013.h5ad27210
273273283d65eb-dd53-496d-adb7-7570c7caa443Transcriptomic diversity of cell types across ...10.1101/2022.10.12.5118980325478a-9b52-45b5-b40a-2e2ab0d72eb1Supercluster: Upper-layer intratelencephalic0325478a-9b52-45b5-b40a-2e2ab0d72eb1.h5ad455006
475475e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl489653d208b0-2cfd-4366-9866-c3c6114081bcTabula Sapiens - All Cells53d208b0-2cfd-4366-9866-c3c6114081bc.h5ad483152
476476e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl4896a68b64d8-aee3-4947-81b7-36b8fe5a44d2Tabula Sapiens - Stromala68b64d8-aee3-4947-81b7-36b8fe5a44d2.h5ad82478
477477e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl4896c5d88abe-f23a-45fa-a534-788985e93dadTabula Sapiens - Immunec5d88abe-f23a-45fa-a534-788985e93dad.h5ad264824
478478e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl48965a11f879-d1ef-458a-910c-9b0bdfca5ebfTabula Sapiens - Endothelial5a11f879-d1ef-458a-910c-9b0bdfca5ebf.h5ad31691
479479e5f58829-1a66-40b5-a624-9046778e74f5Tabula Sapiens10.1126/science.abl489697a17473-e2b1-4f31-a544-44a60773e2ddTabula Sapiens - Epithelial97a17473-e2b1-4f31-a544-44a60773e2dd.h5ad104148
\n", "
" ], "text/plain": [ " soma_joinid collection_id \\\n", "5 5 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "6 6 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "7 7 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "8 8 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "9 9 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "11 11 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "12 12 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "14 14 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "15 15 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "19 19 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "20 20 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "21 21 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "23 23 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "24 24 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "25 25 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "26 26 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "27 27 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "43 43 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "139 139 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "143 143 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "160 160 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "165 165 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "175 175 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "176 176 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "177 177 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "178 178 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "183 183 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "196 196 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "197 197 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "206 206 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "208 208 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "220 220 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "243 243 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "245 245 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "249 249 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "270 270 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "273 273 283d65eb-dd53-496d-adb7-7570c7caa443 \n", "475 475 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "476 476 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "477 477 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "478 478 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "479 479 e5f58829-1a66-40b5-a624-9046778e74f5 \n", "\n", " collection_name \\\n", "5 Tabula Sapiens \n", "6 Tabula Sapiens \n", "7 Tabula Sapiens \n", "8 Tabula Sapiens \n", "9 Tabula Sapiens \n", "11 Tabula Sapiens \n", "12 Tabula Sapiens \n", "14 Tabula Sapiens \n", "15 Tabula Sapiens \n", "19 Tabula Sapiens \n", "20 Tabula Sapiens \n", "21 Tabula Sapiens \n", "23 Tabula Sapiens \n", "24 Tabula Sapiens \n", "25 Tabula Sapiens \n", "26 Tabula Sapiens \n", "27 Tabula Sapiens \n", "43 Transcriptomic diversity of cell types across ... \n", "139 Transcriptomic diversity of cell types across ... \n", "143 Transcriptomic diversity of cell types across ... \n", "160 Transcriptomic diversity of cell types across ... \n", "165 Transcriptomic diversity of cell types across ... \n", "175 Transcriptomic diversity of cell types across ... \n", "176 Transcriptomic diversity of cell types across ... \n", "177 Transcriptomic diversity of cell types across ... \n", "178 Transcriptomic diversity of cell types across ... \n", "183 Transcriptomic diversity of cell types across ... \n", "196 Transcriptomic diversity of cell types across ... \n", "197 Transcriptomic diversity of cell types across ... \n", "206 Transcriptomic diversity of cell types across ... \n", "208 Transcriptomic diversity of cell types across ... \n", "220 Transcriptomic diversity of cell types across ... \n", "243 Transcriptomic diversity of cell types across ... \n", "245 Transcriptomic diversity of cell types across ... \n", "249 Transcriptomic diversity of cell types across ... \n", "270 Transcriptomic diversity of cell types across ... \n", "273 Transcriptomic diversity of cell types across ... \n", "475 Tabula Sapiens \n", "476 Tabula Sapiens \n", "477 Tabula Sapiens \n", "478 Tabula Sapiens \n", "479 Tabula Sapiens \n", "\n", " collection_doi dataset_id \\\n", "5 10.1126/science.abl4896 ff45e623-7f5f-46e3-b47d-56be0341f66b \n", "6 10.1126/science.abl4896 f01bdd17-4902-40f5-86e3-240d66dd2587 \n", "7 10.1126/science.abl4896 e6a11140-2545-46bc-929e-da243eed2cae \n", "8 10.1126/science.abl4896 e5c63d94-593c-4338-a489-e1048599e751 \n", "9 10.1126/science.abl4896 d8732da6-8d1d-42d9-b625-f2416c30054b \n", "11 10.1126/science.abl4896 cee11228-9f0b-4e57-afe2-cfe15ee56312 \n", "12 10.1126/science.abl4896 a357414d-2042-4eb5-95f0-c58604a18bdd \n", "14 10.1126/science.abl4896 a0754256-f44b-4c4a-962c-a552e47d3fdc \n", "15 10.1126/science.abl4896 983d5ec9-40e8-4512-9e65-a572a9c486cb \n", "19 10.1126/science.abl4896 5e5e7a2f-8f1c-42ac-90dc-b4f80f38e84c \n", "20 10.1126/science.abl4896 55cf0ea3-9d2b-4294-871e-bb4b49a79fc7 \n", "21 10.1126/science.abl4896 4f1555bc-4664-46c3-a606-78d34dd10d92 \n", "23 10.1126/science.abl4896 2423ce2c-3149-4cca-a2ff-cf682ea29b5f \n", "24 10.1126/science.abl4896 1c9eb291-6d31-47e1-96b2-129b5e1ae64f \n", "25 10.1126/science.abl4896 18eb630b-a754-4111-8cd4-c24ec80aa5ec \n", "26 10.1126/science.abl4896 0d2ee4ac-05ee-40b2-afb6-ebb584caa867 \n", "27 10.1126/science.abl4896 0ced5e76-6040-47ff-8a72-93847965afc0 \n", "43 10.1101/2022.10.12.511898 8e10f1c4-8e98-41e5-b65f-8cd89a887122 \n", "139 10.1101/2022.10.12.511898 fe1a73ab-a203-45fd-84e9-0f7fd19efcbd \n", "143 10.1101/2022.10.12.511898 f8dda921-5fb4-4c94-a654-c6fc346bfd6d \n", "160 10.1101/2022.10.12.511898 dd03ce70-3243-4c96-9561-330cc461e4d7 \n", "165 10.1101/2022.10.12.511898 d2b5efc1-14c6-4b5f-bd98-40f9084872d7 \n", "175 10.1101/2022.10.12.511898 c4b03352-af8d-492a-8d6b-40f304e0a122 \n", "176 10.1101/2022.10.12.511898 c2aad8fc-b63b-4f9b-9cfd-baf7bc9c1771 \n", "177 10.1101/2022.10.12.511898 c202b243-1aa1-4b16-bc9a-b36241f3b1e3 \n", "178 10.1101/2022.10.12.511898 bdb26abd-f4ba-4ea3-8862-c2340e7a4f55 \n", "183 10.1101/2022.10.12.511898 acae7679-d077-461c-b857-ee6ccfeb267f \n", "196 10.1101/2022.10.12.511898 9372df2d-13d6-4fac-980b-919a5b7eb483 \n", "197 10.1101/2022.10.12.511898 93131426-0124-4ab4-a013-9dfbcd99d467 \n", "206 10.1101/2022.10.12.511898 7c1c3d47-3166-43e5-9a95-65ceb2d45f78 \n", "208 10.1101/2022.10.12.511898 7a0a8891-9a22-4549-a55b-c2aca23c3a2a \n", "220 10.1101/2022.10.12.511898 5e5ab909-f73f-4b57-98a0-6d2c5662f6a4 \n", "243 10.1101/2022.10.12.511898 3f56901c-dd4a-47d6-b60b-7b0c0111cfb2 \n", "245 10.1101/2022.10.12.511898 3a7f3ab4-a280-4b3b-b2c0-6dd05614a78c \n", "249 10.1101/2022.10.12.511898 35c8a04c-8639-4d15-8228-765d8d93fc96 \n", "270 10.1101/2022.10.12.511898 07b1d7c8-5c2e-42f7-9246-26f746cd6013 \n", "273 10.1101/2022.10.12.511898 0325478a-9b52-45b5-b40a-2e2ab0d72eb1 \n", "475 10.1126/science.abl4896 53d208b0-2cfd-4366-9866-c3c6114081bc \n", "476 10.1126/science.abl4896 a68b64d8-aee3-4947-81b7-36b8fe5a44d2 \n", "477 10.1126/science.abl4896 c5d88abe-f23a-45fa-a534-788985e93dad \n", "478 10.1126/science.abl4896 5a11f879-d1ef-458a-910c-9b0bdfca5ebf \n", "479 10.1126/science.abl4896 97a17473-e2b1-4f31-a544-44a60773e2dd \n", "\n", " dataset_title \\\n", "5 Tabula Sapiens - Pancreas \n", "6 Tabula Sapiens - Salivary_Gland \n", "7 Tabula Sapiens - Heart \n", "8 Tabula Sapiens - Bladder \n", "9 Tabula Sapiens - Trachea \n", "11 Tabula Sapiens - Spleen \n", "12 Tabula Sapiens - Small_Intestine \n", "14 Tabula Sapiens - Eye \n", "15 Tabula Sapiens - Blood \n", "19 Tabula Sapiens - Fat \n", "20 Tabula Sapiens - Tongue \n", "21 Tabula Sapiens - Bone_Marrow \n", "23 Tabula Sapiens - Kidney \n", "24 Tabula Sapiens - Muscle \n", "25 Tabula Sapiens - Lymph_Node \n", "26 Tabula Sapiens - Lung \n", "27 Tabula Sapiens - Thymus \n", "43 All neurons \n", "139 Dissection: Amygdaloid complex (AMY) - basolat... \n", "143 Dissection: Cerebral cortex (Cx) - Occipitotem... \n", "160 Dissection: Cerebral cortex (Cx) - Perirhinal ... \n", "165 Dissection: Tail of Hippocampus (HiT) - Caudal... \n", "175 Supercluster: Medium spiny neuron \n", "176 Dissection: Cerebral cortex (Cx) - Temporal po... \n", "177 Supercluster: Amygdala excitatory \n", "178 Supercluster: CGE interneuron \n", "183 Dissection: Head of hippocampus (HiH) - CA1 \n", "196 Dissection: Midbrain (M) - Periaqueductal gray... \n", "197 Dissection: Epithalamus - ETH \n", "206 Dissection: Pons (Pn) - Pontine reticular form... \n", "208 Supercluster: Hippocampal CA1-3 \n", "220 Dissection: Midbrain (M) - Inferior colliculus... \n", "243 Dissection: Head of hippocampus (HiH) - CA1-3 \n", "245 Supercluster: Splatter \n", "249 Dissection: Hypothalamus (HTH) - supraoptic re... \n", "270 Dissection: Myelencephalon (medulla oblongata)... \n", "273 Supercluster: Upper-layer intratelencephalic \n", "475 Tabula Sapiens - All Cells \n", "476 Tabula Sapiens - Stromal \n", "477 Tabula Sapiens - Immune \n", "478 Tabula Sapiens - Endothelial \n", "479 Tabula Sapiens - Epithelial \n", "\n", " dataset_h5ad_path dataset_total_cell_count \n", "5 ff45e623-7f5f-46e3-b47d-56be0341f66b.h5ad 13497 \n", "6 f01bdd17-4902-40f5-86e3-240d66dd2587.h5ad 27199 \n", "7 e6a11140-2545-46bc-929e-da243eed2cae.h5ad 11505 \n", "8 e5c63d94-593c-4338-a489-e1048599e751.h5ad 24583 \n", "9 d8732da6-8d1d-42d9-b625-f2416c30054b.h5ad 9522 \n", "11 cee11228-9f0b-4e57-afe2-cfe15ee56312.h5ad 34004 \n", "12 a357414d-2042-4eb5-95f0-c58604a18bdd.h5ad 12467 \n", "14 a0754256-f44b-4c4a-962c-a552e47d3fdc.h5ad 10650 \n", "15 983d5ec9-40e8-4512-9e65-a572a9c486cb.h5ad 50115 \n", "19 5e5e7a2f-8f1c-42ac-90dc-b4f80f38e84c.h5ad 20263 \n", "20 55cf0ea3-9d2b-4294-871e-bb4b49a79fc7.h5ad 15020 \n", "21 4f1555bc-4664-46c3-a606-78d34dd10d92.h5ad 12297 \n", "23 2423ce2c-3149-4cca-a2ff-cf682ea29b5f.h5ad 9641 \n", "24 1c9eb291-6d31-47e1-96b2-129b5e1ae64f.h5ad 30746 \n", "25 18eb630b-a754-4111-8cd4-c24ec80aa5ec.h5ad 53275 \n", "26 0d2ee4ac-05ee-40b2-afb6-ebb584caa867.h5ad 35682 \n", "27 0ced5e76-6040-47ff-8a72-93847965afc0.h5ad 33664 \n", "43 8e10f1c4-8e98-41e5-b65f-8cd89a887122.h5ad 2480956 \n", "139 fe1a73ab-a203-45fd-84e9-0f7fd19efcbd.h5ad 35285 \n", "143 f8dda921-5fb4-4c94-a654-c6fc346bfd6d.h5ad 31899 \n", "160 dd03ce70-3243-4c96-9561-330cc461e4d7.h5ad 23732 \n", "165 d2b5efc1-14c6-4b5f-bd98-40f9084872d7.h5ad 36886 \n", "175 c4b03352-af8d-492a-8d6b-40f304e0a122.h5ad 152189 \n", "176 c2aad8fc-b63b-4f9b-9cfd-baf7bc9c1771.h5ad 37642 \n", "177 c202b243-1aa1-4b16-bc9a-b36241f3b1e3.h5ad 109452 \n", "178 bdb26abd-f4ba-4ea3-8862-c2340e7a4f55.h5ad 227671 \n", "183 acae7679-d077-461c-b857-ee6ccfeb267f.h5ad 39147 \n", "196 9372df2d-13d6-4fac-980b-919a5b7eb483.h5ad 33794 \n", "197 93131426-0124-4ab4-a013-9dfbcd99d467.h5ad 24327 \n", "206 7c1c3d47-3166-43e5-9a95-65ceb2d45f78.h5ad 49512 \n", "208 7a0a8891-9a22-4549-a55b-c2aca23c3a2a.h5ad 74979 \n", "220 5e5ab909-f73f-4b57-98a0-6d2c5662f6a4.h5ad 32306 \n", "243 3f56901c-dd4a-47d6-b60b-7b0c0111cfb2.h5ad 37911 \n", "245 3a7f3ab4-a280-4b3b-b2c0-6dd05614a78c.h5ad 291833 \n", "249 35c8a04c-8639-4d15-8228-765d8d93fc96.h5ad 16753 \n", "270 07b1d7c8-5c2e-42f7-9246-26f746cd6013.h5ad 27210 \n", "273 0325478a-9b52-45b5-b40a-2e2ab0d72eb1.h5ad 455006 \n", "475 53d208b0-2cfd-4366-9866-c3c6114081bc.h5ad 483152 \n", "476 a68b64d8-aee3-4947-81b7-36b8fe5a44d2.h5ad 82478 \n", "477 c5d88abe-f23a-45fa-a534-788985e93dad.h5ad 264824 \n", "478 5a11f879-d1ef-458a-910c-9b0bdfca5ebf.h5ad 31691 \n", "479 97a17473-e2b1-4f31-a544-44a60773e2dd.h5ad 104148 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Grab the feature's soma_joinid from the var dataframe\n", "var_joinid = var_df.loc[var_df.feature_id == \"ENSG00000286096\"].soma_joinid\n", "\n", "# The presence matrix is indexed by the joinids of the dataset and var dataframes,\n", "# so slice out the feature of interest by its joinid.\n", "dataset_joinids = presence_matrix[:, var_joinid].tocoo().row\n", "\n", "# From the datasets dataframe, slice out the datasets which have a joinid in the list\n", "datasets_df.loc[datasets_df.soma_joinid.isin(dataset_joinids)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Identifying all genes measured in a dataset \n", "\n", "Finally, we can find the set of genes that were measured in the cells of a given dataset." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T14:31:47.419947Z", "iopub.status.busy": "2023-07-28T14:31:47.419713Z", "iopub.status.idle": "2023-07-28T14:31:47.431270Z", "shell.execute_reply": "2023-07-28T14:31:47.430798Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
soma_joinidfeature_idfeature_namefeature_length
00ENSG00000121410A1BG3999
11ENSG00000268895A1BG-AS13374
22ENSG00000148584A1CF9603
33ENSG00000175899A2M6318
44ENSG00000245105A2M-AS12948
...............
5810958109ENSG00000277745H2AB3591
5835458354ENSG00000233522FAM224A2031
5841158411ENSG00000183146PRORY878
5852358523ENSG00000279274RP11-533E23.275
5863258632ENSG00000277836ENSG00000277836.1288
\n", "

27211 rows × 4 columns

\n", "
" ], "text/plain": [ " soma_joinid feature_id feature_name feature_length\n", "0 0 ENSG00000121410 A1BG 3999\n", "1 1 ENSG00000268895 A1BG-AS1 3374\n", "2 2 ENSG00000148584 A1CF 9603\n", "3 3 ENSG00000175899 A2M 6318\n", "4 4 ENSG00000245105 A2M-AS1 2948\n", "... ... ... ... ...\n", "58109 58109 ENSG00000277745 H2AB3 591\n", "58354 58354 ENSG00000233522 FAM224A 2031\n", "58411 58411 ENSG00000183146 PRORY 878\n", "58523 58523 ENSG00000279274 RP11-533E23.2 75\n", "58632 58632 ENSG00000277836 ENSG00000277836.1 288\n", "\n", "[27211 rows x 4 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Slice the dataset(s) of interest, and get the joinid(s)\n", "dataset_joinids = datasets_df.loc[datasets_df.collection_id == \"17481d16-ee44-49e5-bcf0-28c0780d8c4a\"].soma_joinid\n", "\n", "# Slice the presence matrix by the first dimension, i.e., by dataset\n", "var_joinids = presence_matrix[dataset_joinids, :].tocoo().col\n", "\n", "# From the feature (var) dataframe, slice out features which have a joinid in the list.\n", "var_df.loc[var_df.soma_joinid.isin(var_joinids)]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.10" }, "vscode": { "interpreter": { "hash": "3da8ec1c162cd849e59e6ea2824b2e353dce799884e910aae99411be5277f953" } } }, "nbformat": 4, "nbformat_minor": 2 }