{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Normalizing full-length gene sequencing data\n", "\n", "This tutorial shows you how to fetch full-length gene sequencing data from the Census and normalize it to account for gene length.\n", "\n", "**Contents**\n", "\n", "1. Opening the census\n", "2. Fetching example full-length sequencing data (Smart-Seq2)\n", "3. Normalizing expression to account for gene length\n", "4. Validation through clustering exploration\n", "\n", "⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable `is_primary_data` which is described in the [Census schema](https://github.com/chanzuckerberg/cellxgene-census/blob/main/docs/cellxgene_census_schema.md#repeated-data).\n", "For this notebook we will focus on individual datasets, therefore we can ignore this variable.\n", "\n", "## Opening the census\n", "\n", "First we open the Census, if you are not familiar with the basics of the Census API you should take a look at notebook [Learning about the CZ CELLxGENE Census](https://chanzuckerberg.github.io/cellxgene-census/notebooks/analysis_demo/comp_bio_census_info.html)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T14:30:01.325351Z", "iopub.status.busy": "2023-07-28T14:30:01.325095Z", "iopub.status.idle": "2023-07-28T14:30:04.665645Z", "shell.execute_reply": "2023-07-28T14:30:04.665030Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The \"stable\" release is currently 2023-07-25. Specify 'census_version=\"2023-07-25\"' in future calls to open_soma() to ensure data consistency.\n" ] } ], "source": [ "import cellxgene_census\n", "import scanpy as sc\n", "from scipy.sparse import csr_matrix\n", "\n", "census = cellxgene_census.open_soma()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can learn more about the all of the `cellxgene_census` methods by accessing their corresponding documention via `help()`. For example `help(cellxgene_census.open_soma)`. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fetching full-length example sequencing data (Smart-Seq)\n", "\n", "Let's get some example data, in this case we'll fetch all cells from a relatively small dataset derived from the Smart-Seq2 technology which performs full-length gene sequencing:\n", "\n", "- Collection: [Tabula Muris Senis](https://cellxgene.cziscience.com/collections/0b9d8a04-bb9d-44da-aa27-705bb65b54eb)\n", "- Dataset: [Liver - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - Smart-seq2](https://cellxgene.cziscience.com/e/524179b0-b406-4723-9c46-293ffa77ca81.cxg/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's first find this dataset's id by using the dataset table of the Census" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T14:30:04.668922Z", "iopub.status.busy": "2023-07-28T14:30:04.668453Z", "iopub.status.idle": "2023-07-28T14:30:05.295251Z", "shell.execute_reply": "2023-07-28T14:30:05.294682Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | soma_joinid | \n", "collection_id | \n", "collection_name | \n", "collection_doi | \n", "dataset_id | \n", "dataset_title | \n", "dataset_h5ad_path | \n", "dataset_total_cell_count | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "525 | \n", "0b9d8a04-bb9d-44da-aa27-705bb65b54eb | \n", "Tabula Muris Senis | \n", "10.1038/s41586-020-2496-1 | \n", "4546e757-34d0-4d17-be06-538318925fcd | \n", "Liver - A single-cell transcriptomic atlas cha... | \n", "4546e757-34d0-4d17-be06-538318925fcd.h5ad | \n", "2859 | \n", "