{ "cells": [ { "cell_type": "markdown", "id": "88812eae-6b46-48b4-a1e4-c468657d8480", "metadata": {}, "source": [ "# Generating citations for Census slices\n", "\n", "This notebook demonstrates how to generate a citation string for all datasets contained in a Census slice.\n", "\n", "**Contents**\n", "\n", "1. Requirements\n", "1. Generating citation strings\n", " 1. Via cell metadata query\n", " 1. Via an AnnData query \n", "\n", "⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable `is_primary_data` which is described in the [Census schema](https://github.com/chanzuckerberg/cellxgene-census/blob/main/docs/cellxgene_census_schema.md#repeated-data).\n", "\n", "## Requirements\n", "\n", "This notebook requires:\n", "\n", "- `cellxgene_census` Python package.\n", "- Census data release with [schema version](https://github.com/chanzuckerberg/cellxgene-census/blob/main/docs/cellxgene_census_schema.md) 1.3.0 or greater.\n", "\n", "## Generating citation strings\n", "\n", "First we open a handle to the Census data. To ensure we open a data release with schema version 1.3.0 or greater, we use `census_version=\"latest\"`" ] }, { "cell_type": "code", "execution_count": 1, "id": "9a5a5a92-3d78-4542-95a5-e6889f245491", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
soma_joinidlabelvalue
00census_schema_version1.3.0
11census_build_date2024-01-01
22dataset_schema_version4.0.0
33total_cell_count75694072
44unique_cell_count45846761
55number_donors_homo_sapiens16292
66number_donors_mus_musculus2153
\n", "
" ], "text/plain": [ " soma_joinid label value\n", "0 0 census_schema_version 1.3.0\n", "1 1 census_build_date 2024-01-01\n", "2 2 dataset_schema_version 4.0.0\n", "3 3 total_cell_count 75694072\n", "4 4 unique_cell_count 45846761\n", "5 5 number_donors_homo_sapiens 16292\n", "6 6 number_donors_mus_musculus 2153" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import cellxgene_census\n", "\n", "census = cellxgene_census.open_soma(census_version=\"latest\")\n", "census[\"census_info\"][\"summary\"].read().concat().to_pandas()" ] }, { "cell_type": "markdown", "id": "23174644-7804-4723-a4ab-c5cf75bdd954", "metadata": {}, "source": [ "Then we load the dataset table which contains a column `\"citation\"` for each dataset included in Census. " ] }, { "cell_type": "code", "execution_count": 2, "id": "d47b636a-d653-4e3b-b139-14b6ca697ce8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Dataset Version: https://datasets.cellxgene.cz...\n", "1 Dataset Version: https://datasets.cellxgene.cz...\n", "2 Dataset Version: https://datasets.cellxgene.cz...\n", "3 Dataset Version: https://datasets.cellxgene.cz...\n", "4 Publication: https://doi.org/10.1002/ctm2.1356...\n", " ... \n", "695 Publication: https://doi.org/10.1038/s41586-02...\n", "696 Publication: https://doi.org/10.1038/s41586-02...\n", "697 Publication: https://doi.org/10.1016/j.isci.20...\n", "698 Publication: https://doi.org/10.1371/journal.p...\n", "699 Publication: https://doi.org/10.1016/j.isci.20...\n", "Name: citation, Length: 700, dtype: object" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datasets = census[\"census_info\"][\"datasets\"].read().concat().to_pandas()\n", "datasets[\"citation\"]" ] }, { "cell_type": "markdown", "id": "06adfa4a-3656-4f26-9adf-ba28eb2f691e", "metadata": {}, "source": [ "And now we can use the column `\"dataset_id\"` present in both the dataset table and the Census cell metadata to create citation strings for any Census slice.\n", "\n", "### Via cell metadata query" ] }, { "cell_type": "code", "execution_count": 3, "id": "f7edf4a7-8394-4df2-9dde-b24efcd6dbe0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/4866a804-37eb-436f-8c87-9cd585260061.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", "\n", "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/bfd80f12-725c-4482-ad7f-1ed2b4909b0d.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", "\n", "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/e6df8a57-f54f-413a-9d4d-dee03294d778.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", "\n", "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8d599205-5c51-4b50-9d48-3dec31238587.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", "\n", "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/f6065c51-bd26-4aa5-a05d-2805aeea48d9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", "\n", "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8cdbf790-4d29-4f46-9aef-21adfb2e21da.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n" ] } ], "source": [ "# Query cell metadata\n", "cell_metadata = census[\"census_data\"][\"homo_sapiens\"].obs.read(\n", " value_filter=\"tissue == 'cardiac atrium'\", column_names=[\"dataset_id\", \"cell_type\"]\n", ")\n", "cell_metadata = cell_metadata.concat().to_pandas()\n", "\n", "# Get a citation string for the slice\n", "slice_datasets = datasets[datasets[\"dataset_id\"].isin(cell_metadata[\"dataset_id\"])]\n", "print(*slice_datasets[\"citation\"], sep=\"\\n\\n\")" ] }, { "cell_type": "markdown", "id": "941c6f2d-6bc2-46c3-963e-e74335fe93f6", "metadata": {}, "source": [ "### Via AnnData query" ] }, { "cell_type": "code", "execution_count": 4, "id": "2d9b2a11-2f48-43a5-8955-759019ce6bed", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/4866a804-37eb-436f-8c87-9cd585260061.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", "\n", "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/bfd80f12-725c-4482-ad7f-1ed2b4909b0d.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", "\n", "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/e6df8a57-f54f-413a-9d4d-dee03294d778.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", "\n", "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8d599205-5c51-4b50-9d48-3dec31238587.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", "\n", "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/f6065c51-bd26-4aa5-a05d-2805aeea48d9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", "\n", "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8cdbf790-4d29-4f46-9aef-21adfb2e21da.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n" ] } ], "source": [ "# Fetch an AnnData object\n", "adata = cellxgene_census.get_anndata(\n", " census=census,\n", " organism=\"homo_sapiens\",\n", " measurement_name=\"RNA\",\n", " obs_value_filter=\"tissue == 'cardiac atrium'\",\n", " var_value_filter=\"feature_name == 'MYBPC3'\",\n", " column_names={\"obs\": [\"dataset_id\", \"cell_type\"]},\n", ")\n", "\n", "# Get a citation string for the slice\n", "slice_datasets = datasets[datasets[\"dataset_id\"].isin(adata.obs[\"dataset_id\"])]\n", "print(*slice_datasets[\"citation\"], sep=\"\\n\\n\")" ] }, { "cell_type": "markdown", "id": "f6988186-5294-43f9-bfe5-2ac255aa0b26", "metadata": {}, "source": [ "And don't forget to close the Census handle" ] }, { "cell_type": "code", "execution_count": 6, "id": "f96b1c3b-4a2a-469a-9ded-1c5ff98b84aa", "metadata": {}, "outputs": [], "source": [ "census.close()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }