{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "YkHzXhTHvXUc" }, "source": [ "# Querying data using the gget cellxgene module\n", "\n", "*By Laura Luebbert, lauraluebbert@caltech.edu.*\n", "\n", "[gget](https://github.com/pachterlab/gget) is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.\n", "\n", "The [gget cellxgene](https://pachterlab.github.io/gget/cellxgene.html) module builds on the [CZ CELLxGENE Discover Census](https://chanzuckerberg.github.io/cellxgene-census/) to query data from [CZ CELLxGENE Discover](https://cellxgene.cziscience.com/). This notebook briefly introduces the [gget cellxgene](https://pachterlab.github.io/gget/cellxgene.html) module by providing one simple example for each supported query type.\n", "\n", "If you use gget cellxgene in a publication, please [cite gget](https://pachterlab.github.io/gget/cite.html) in addition to [citing CZ CELLxGENE](https://cellxgene.cziscience.com/docs/08__Cite%20cellxgene%20in%20your%20publications).\n", "\n", "You can also [open this notebook in Google Colab](https://colab.research.google.com/github/chanzuckerberg/cellxgene-census/blob/main/api/python/notebooks/api_demo/census_gget_demo.ipynb).\n", "\n", "**Contents** \n", "\n", "1. Install gget.\n", "2. Fetch an [AnnData](https://anndata.readthedocs.io/en/latest/) object by selecting gene(s), tissue(s) and cell type(s).\n", "3. Plot a dot plot similar to those shown on the CZ CELLxGENE Discover [Gene Expression](https://cellxgene.cziscience.com/gene-expression).\n", "4. Fetch only cell metadata (corresponds to AnnData.obs).\n", "5. Use [gget cellxgene](https://pachterlab.github.io/gget/cellxgene.html) from the command line." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "gaPshWPxwzo9" }, "source": [ "## Install gget and set up cellxgene module" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T16:16:11.920035Z", "iopub.status.busy": "2023-07-28T16:16:11.919631Z", "iopub.status.idle": "2023-07-28T16:16:22.202695Z", "shell.execute_reply": "2023-07-28T16:16:22.202036Z" }, "id": "bKTgv7hCQxS1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.2.1\u001b[0m\r\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\r\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Fri Jul 28 16:16:17 2023 INFO Installing cellxgene-census package (requires pip).\n", "Fri Jul 28 16:16:22 2023 INFO cellxgene_census installed succesfully.\n" ] } ], "source": [ "# The cellxgene module was added to gget in version 0.25.7\n", "!pip install -q gget >=0.25.7\n", "\n", "import gget\n", "\n", "gget.setup(\"cellxgene\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2023-07-28T16:16:22.205662Z", "iopub.status.busy": "2023-07-28T16:16:22.205208Z", "iopub.status.idle": "2023-07-28T16:16:22.209709Z", "shell.execute_reply": "2023-07-28T16:16:22.209150Z" }, "id": "f4hLtaBPToWG", "outputId": "e81057ba-578a-464b-caa5-32edcdd206b4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function cellxgene in module gget.gget_cellxgene:\n", "\n", "cellxgene(species='homo_sapiens', gene=None, ensembl=False, column_names=['dataset_id', 'assay', 'suspension_type', 'sex', 'tissue_general', 'tissue', 'cell_type'], meta_only=False, tissue=None, cell_type=None, development_stage=None, disease=None, sex=None, is_primary_data=True, dataset_id=None, tissue_general_ontology_term_id=None, tissue_general=None, assay_ontology_term_id=None, assay=None, cell_type_ontology_term_id=None, development_stage_ontology_term_id=None, disease_ontology_term_id=None, donor_id=None, self_reported_ethnicity_ontology_term_id=None, self_reported_ethnicity=None, sex_ontology_term_id=None, suspension_type=None, tissue_ontology_term_id=None, census_version='stable', verbose=True, out=None)\n", " Query data from CZ CELLxGENE Discover (https://cellxgene.cziscience.com/) using the\n", " CZ CELLxGENE Discover Census (https://github.com/chanzuckerberg/cellxgene-census).\n", " \n", " NOTE: Querying large datasets requires a large amount of RAM. Use the cell metadata attributes\n", " to define the (sub)dataset of interest.\n", " The CZ CELLxGENE Discover Census recommends >16 GB of memory and a >5 Mbps internet connection.\n", " \n", " General args:\n", " - species Choice of 'homo_sapiens' or 'mus_musculus'. Default: 'homo_sapiens'.\n", " - gene Str or list of gene name(s) or Ensembl ID(s), e.g. ['ACE2', 'SLC5A1'] or ['ENSG00000130234', 'ENSG00000100170']. Default: None.\n", " NOTE: Set ensembl=True when providing Ensembl ID(s) instead of gene name(s).\n", " See https://cellxgene.cziscience.com/gene-expression for examples of available genes.\n", " - ensembl True/False (default: False). Set to True when genes are provided as Ensembl IDs.\n", " - column_names List of metadata columns to return (stored in AnnData.obs when meta_only=False).\n", " Default: [\"dataset_id\", \"assay\", \"suspension_type\", \"sex\", \"tissue_general\", \"tissue\", \"cell_type\"]\n", " For more options see: https://api.cellxgene.cziscience.com/curation/ui/#/ -> Schemas -> dataset\n", " - meta_only True/False (default: False). If True, returns only metadata dataframe (corresponds to AnnData.obs).\n", " - census_version Str defining version of Census, e.g. \"2023-05-15\" or \"latest\" or \"stable\". Default: \"stable\".\n", " - verbose True/False whether to print progress information. Default True.\n", " - out If provided, saves the generated AnnData h5ad (or csv when meta_only=True) file with the specified path. Default: None.\n", " \n", " Cell metadata attributes:\n", " - tissue Str or list of tissue(s), e.g. ['lung', 'blood']. Default: None.\n", " See https://cellxgene.cziscience.com/gene-expression for examples of available tissues.\n", " - cell_type Str or list of celltype(s), e.g. ['mucus secreting cell', 'neuroendocrine cell']. Default: None.\n", " See https://cellxgene.cziscience.com/gene-expression and select a tissue to see examples of available celltypes.\n", " - development_stage Str or list of development stage(s). Default: None.\n", " - disease Str or list of disease(s). Default: None.\n", " - sex Str or list of sex(es), e.g. 'female'. Default: None.\n", " - is_primary_data True/False (default: True). If True, returns only the canonical instance of the cellular observation.\n", " This is commonly set to False for meta-analyses reusing data or for secondary views of data.\n", " - dataset_id Str or list of CELLxGENE dataset ID(s). Default: None.\n", " - tissue_general_ontology_term_id Str or list of high-level tissue UBERON ID(s). Default: None.\n", " Also see: https://github.com/chanzuckerberg/single-cell-data-portal/blob/9b94ccb0a2e0a8f6182b213aa4852c491f6f6aff/backend/wmg/data/tissue_mapper.py\n", " - tissue_general Str or list of high-level tissue label(s). Default: None.\n", " Also see: https://github.com/chanzuckerberg/single-cell-data-portal/blob/9b94ccb0a2e0a8f6182b213aa4852c491f6f6aff/backend/wmg/data/tissue_mapper.py\n", " - tissue_ontology_term_id Str or list of tissue ontology term ID(s) as defined in the CELLxGENE dataset schema. Default: None.\n", " - assay_ontology_term_id Str or list of assay ontology term ID(s) as defined in the CELLxGENE dataset schema. Default: None.\n", " - assay Str or list of assay(s) as defined in the CELLxGENE dataset schema. Default: None.\n", " - cell_type_ontology_term_id Str or list of celltype ontology term ID(s) as defined in the CELLxGENE dataset schema. Default: None.\n", " - development_stage_ontology_term_id Str or list of development stage ontology term ID(s) as defined in the CELLxGENE dataset schema. Default: None.\n", " - disease_ontology_term_id Str or list of disease ontology term ID(s) as defined in the CELLxGENE dataset schema. Default: None.\n", " - donor_id Str or list of donor ID(s) as defined in the CELLxGENE dataset schema. Default: None.\n", " - self_reported_ethnicity_ontology_term_id Str or list of self reported ethnicity ontology ID(s) as defined in the CELLxGENE dataset schema. Default: None.\n", " - self_reported_ethnicity Str or list of self reported ethnicity as defined in the CELLxGENE dataset schema. Default: None.\n", " - sex_ontology_term_id Str or list of sex ontology ID(s) as defined in the CELLxGENE dataset schema. Default: None.\n", " - suspension_type Str or list of suspension type(s) as defined in the CELLxGENE dataset schema. Default: None.\n", " \n", " Returns AnnData object (when meta_only=False) or dataframe (when meta_only=True).\n", "\n" ] } ], "source": [ "# Display all options of the cellxgene gget module\n", "help(gget.cellxgene)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "6j6vqbOXw9X3" }, "source": [ "## Fetch an [AnnData](https://anndata.readthedocs.io/en/latest/) object by selecting gene(s), tissue(s) and cell type(s)\n", "You can use all of the options listed above to filter for data of interest. Here, we will demonstrate the module by fetching a small dataset containing only three genes and two lung cell types:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2023-07-28T16:16:22.212324Z", "iopub.status.busy": "2023-07-28T16:16:22.211965Z", "iopub.status.idle": "2023-07-28T16:16:43.724382Z", "shell.execute_reply": "2023-07-28T16:16:43.723763Z" }, "id": "OnDHwjjSQ2uD", "outputId": "9cdb068d-300a-4cdc-f0cb-cf62d0eb3097" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fri Jul 28 16:16:22 2023 INFO Fetching AnnData object from CZ CELLxGENE Discover. This might take a few minutes...\n", "The \"stable\" release is currently 2023-07-25. Specify 'census_version=\"2023-07-25\"' in future calls to open_soma() to ensure data consistency.\n", "Fri Jul 28 16:16:22 2023 INFO The \"stable\" release is currently 2023-07-25. Specify 'census_version=\"2023-07-25\"' in future calls to open_soma() to ensure data consistency.\n" ] } ], "source": [ "# Fetch AnnData object based on specified genes, tissue and cell types\n", "adata = gget.cellxgene(\n", " gene=[\"ACE2\", \"ABCA1\", \"SLC5A1\"], tissue=\"lung\", cell_type=[\"mucus secreting cell\", \"neuroendocrine cell\"]\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "6krDnFMLyeRl" }, "source": [ "Let's look at some of the features of the AnnData object we just fetched:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2023-07-28T16:16:43.727629Z", "iopub.status.busy": "2023-07-28T16:16:43.727173Z", "iopub.status.idle": "2023-07-28T16:16:43.733167Z", "shell.execute_reply": "2023-07-28T16:16:43.732546Z" }, "id": "MqZM-2uNTt1L", "outputId": "7b14f3e9-8264-428d-b6d4-3320e17776d5" }, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 3679 × 3\n", " obs: 'dataset_id', 'assay', 'suspension_type', 'sex', 'tissue_general', 'tissue', 'cell_type', 'is_primary_data'\n", " var: 'soma_joinid', 'feature_id', 'feature_name', 'feature_length'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "Yl34ulk7ziH8" }, "source": [ "A few thousand cells from CZ CELLxGENE Discover matched the filters specified above and their ACE2, ABCA1, and SLC5A1 expression matrix in lung mucus secreting and neuroendocrine cells was fetched. The `.var` and `.obs` layers contain additional information about each gene and cell, respectively:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 143 }, "execution": { "iopub.execute_input": "2023-07-28T16:16:43.735745Z", "iopub.status.busy": "2023-07-28T16:16:43.735429Z", "iopub.status.idle": "2023-07-28T16:16:43.742939Z", "shell.execute_reply": "2023-07-28T16:16:43.742344Z" }, "id": "qzdc41PdTwDN", "outputId": "a1a48a86-14bc-4559-ab3b-19d7f29fa153" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
soma_joinidfeature_idfeature_namefeature_length
038ENSG00000165029ABCA111343
15332ENSG00000130234ACE29739
224539ENSG00000100170SLC5A15081
\n", "
" ], "text/plain": [ " soma_joinid feature_id feature_name feature_length\n", "0 38 ENSG00000165029 ABCA1 11343\n", "1 5332 ENSG00000130234 ACE2 9739\n", "2 24539 ENSG00000100170 SLC5A1 5081" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata.var" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "execution": { "iopub.execute_input": "2023-07-28T16:16:43.745331Z", "iopub.status.busy": "2023-07-28T16:16:43.745024Z", "iopub.status.idle": "2023-07-28T16:16:43.754989Z", "shell.execute_reply": "2023-07-28T16:16:43.754357Z" }, "id": "lIebiJ0CTxDn", "outputId": "dcf64476-daf1-4355-ab77-724e54cbacbe" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dataset_idassaysuspension_typesextissue_generaltissuecell_typeis_primary_data
09f222629-9e39-47d0-b83f-e08d610c747910x 3' v2cellunknownlunglungmucus secreting cellTrue
19f222629-9e39-47d0-b83f-e08d610c747910x 3' v2cellunknownlunglungmucus secreting cellTrue
29f222629-9e39-47d0-b83f-e08d610c747910x 3' v2cellunknownlunglungmucus secreting cellTrue
39f222629-9e39-47d0-b83f-e08d610c747910x 3' v2cellunknownlunglungmucus secreting cellTrue
49f222629-9e39-47d0-b83f-e08d610c747910x 3' v2cellunknownlunglungmucus secreting cellTrue
...........................
36748c42cfd0-0b0a-46d5-910c-fc833d83c45e10x 3' v2cellfemalelunglungmucus secreting cellTrue
36758c42cfd0-0b0a-46d5-910c-fc833d83c45e10x 3' v2cellfemalelunglungmucus secreting cellTrue
36768c42cfd0-0b0a-46d5-910c-fc833d83c45e10x 3' v2cellfemalelunglungmucus secreting cellTrue
36778c42cfd0-0b0a-46d5-910c-fc833d83c45e10x 3' v2cellfemalelunglungmucus secreting cellTrue
36788c42cfd0-0b0a-46d5-910c-fc833d83c45e10x 3' v2cellfemalelunglungmucus secreting cellTrue
\n", "

3679 rows × 8 columns

\n", "
" ], "text/plain": [ " dataset_id assay suspension_type \\\n", "0 9f222629-9e39-47d0-b83f-e08d610c7479 10x 3' v2 cell \n", "1 9f222629-9e39-47d0-b83f-e08d610c7479 10x 3' v2 cell \n", "2 9f222629-9e39-47d0-b83f-e08d610c7479 10x 3' v2 cell \n", "3 9f222629-9e39-47d0-b83f-e08d610c7479 10x 3' v2 cell \n", "4 9f222629-9e39-47d0-b83f-e08d610c7479 10x 3' v2 cell \n", "... ... ... ... \n", "3674 8c42cfd0-0b0a-46d5-910c-fc833d83c45e 10x 3' v2 cell \n", "3675 8c42cfd0-0b0a-46d5-910c-fc833d83c45e 10x 3' v2 cell \n", "3676 8c42cfd0-0b0a-46d5-910c-fc833d83c45e 10x 3' v2 cell \n", "3677 8c42cfd0-0b0a-46d5-910c-fc833d83c45e 10x 3' v2 cell \n", "3678 8c42cfd0-0b0a-46d5-910c-fc833d83c45e 10x 3' v2 cell \n", "\n", " sex tissue_general tissue cell_type is_primary_data \n", "0 unknown lung lung mucus secreting cell True \n", "1 unknown lung lung mucus secreting cell True \n", "2 unknown lung lung mucus secreting cell True \n", "3 unknown lung lung mucus secreting cell True \n", "4 unknown lung lung mucus secreting cell True \n", "... ... ... ... ... ... \n", "3674 female lung lung mucus secreting cell True \n", "3675 female lung lung mucus secreting cell True \n", "3676 female lung lung mucus secreting cell True \n", "3677 female lung lung mucus secreting cell True \n", "3678 female lung lung mucus secreting cell True \n", "\n", "[3679 rows x 8 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata.obs" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "e4zaccqDUVon" }, "source": [ "## Plot a dot plot similar to those shown on the CZ CELLxGENE Discover [Gene Expression](https://cellxgene.cziscience.com/gene-expression)\n", "Using the data we just fetched, we can plot a dot plot using [scanpy](https://scanpy.readthedocs.io/en/stable/):" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T16:16:43.757485Z", "iopub.status.busy": "2023-07-28T16:16:43.757177Z", "iopub.status.idle": "2023-07-28T16:16:44.018040Z", "shell.execute_reply": "2023-07-28T16:16:44.017433Z" }, "id": "qGHq2q3wT3gw" }, "outputs": [], "source": [ "import scanpy as sc\n", "\n", "# retina increases the resolution of plots displayed in notebooks\n", "%config InlineBackend.figure_format=\"retina\"" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 283 }, "execution": { "iopub.execute_input": "2023-07-28T16:16:44.021494Z", "iopub.status.busy": "2023-07-28T16:16:44.020733Z", "iopub.status.idle": "2023-07-28T16:16:44.242741Z", "shell.execute_reply": "2023-07-28T16:16:44.242197Z" }, "id": "p1FTT-OiUa4k", "outputId": "ca64020b-6c2e-4f61-973f-657bbefa078c" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "... storing 'dataset_id' as categorical\n", "... storing 'assay' as categorical\n", "... storing 'suspension_type' as categorical\n", "... storing 'sex' as categorical\n", "... storing 'tissue_general' as categorical\n", "... storing 'tissue' as categorical\n", "... storing 'cell_type' as categorical\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "image/png": { "height": 266, "width": 351 } }, "output_type": "display_data" } ], "source": [ "sc.pl.dotplot(adata, adata.var[\"feature_name\"].values, groupby=\"cell_type\", gene_symbols=\"feature_name\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "lIqvA3pc0iJA" }, "source": [ "## Fetch only cell metadata (corresponds to AnnData.obs)\n", "By setting `meta_only=True` and again filtering by the cell metadata attributes listed above, you can also fetch only the cell metadata:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "execution": { "iopub.execute_input": "2023-07-28T16:16:44.245498Z", "iopub.status.busy": "2023-07-28T16:16:44.245237Z", "iopub.status.idle": "2023-07-28T16:16:45.534877Z", "shell.execute_reply": "2023-07-28T16:16:45.534303Z" }, "id": "RQJyn-mKU_oh", "outputId": "feb8e251-d619-491b-88a0-2cda9adc2af6" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fri Jul 28 16:16:44 2023 INFO Fetching metadata from CZ CELLxGENE Discover...\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dataset_idassaysuspension_typesextissue_generaltissuecell_typeis_primary_data
0047d57f2-4d14-45de-aa98-336c6f58375010x 3' v2cellunknownlunglungmesenchymal stem cellTrue
1047d57f2-4d14-45de-aa98-336c6f58375010x 3' v2cellunknownlunglungprogenitor cellTrue
2047d57f2-4d14-45de-aa98-336c6f58375010x 3' v2cellunknownlunglungmesenchymal cellTrue
3047d57f2-4d14-45de-aa98-336c6f58375010x 3' v2cellunknownlunglungmesenchymal stem cellTrue
4047d57f2-4d14-45de-aa98-336c6f58375010x 3' v2cellunknownlunglungmesenchymal cellTrue
...........................
9754748b37086-25f7-4ecd-be66-f5bb378e3aea10x 3' v2cellmalelunglungfibroblast of lungTrue
9754848b37086-25f7-4ecd-be66-f5bb378e3aea10x 3' v2cellmalelunglungnatural killer cellTrue
9754948b37086-25f7-4ecd-be66-f5bb378e3aea10x 3' v2cellmalelunglungpulmonary interstitial fibroblastTrue
9755048b37086-25f7-4ecd-be66-f5bb378e3aea10x 3' v2cellmalelunglungadventitial cellTrue
9755148b37086-25f7-4ecd-be66-f5bb378e3aea10x 3' v2cellmalelunglungfibroblast of lungTrue
\n", "

97552 rows × 8 columns

\n", "
" ], "text/plain": [ " dataset_id assay suspension_type \\\n", "0 047d57f2-4d14-45de-aa98-336c6f583750 10x 3' v2 cell \n", "1 047d57f2-4d14-45de-aa98-336c6f583750 10x 3' v2 cell \n", "2 047d57f2-4d14-45de-aa98-336c6f583750 10x 3' v2 cell \n", "3 047d57f2-4d14-45de-aa98-336c6f583750 10x 3' v2 cell \n", "4 047d57f2-4d14-45de-aa98-336c6f583750 10x 3' v2 cell \n", "... ... ... ... \n", "97547 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 cell \n", "97548 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 cell \n", "97549 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 cell \n", "97550 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 cell \n", "97551 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 cell \n", "\n", " sex tissue_general tissue cell_type \\\n", "0 unknown lung lung mesenchymal stem cell \n", "1 unknown lung lung progenitor cell \n", "2 unknown lung lung mesenchymal cell \n", "3 unknown lung lung mesenchymal stem cell \n", "4 unknown lung lung mesenchymal cell \n", "... ... ... ... ... \n", "97547 male lung lung fibroblast of lung \n", "97548 male lung lung natural killer cell \n", "97549 male lung lung pulmonary interstitial fibroblast \n", "97550 male lung lung adventitial cell \n", "97551 male lung lung fibroblast of lung \n", "\n", " is_primary_data \n", "0 True \n", "1 True \n", "2 True \n", "3 True \n", "4 True \n", "... ... \n", "97547 True \n", "97548 True \n", "97549 True \n", "97550 True \n", "97551 True \n", "\n", "[97552 rows x 8 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = gget.cellxgene(\n", " meta_only=True,\n", " census_version=\"2023-05-15\", # Specify Census version for reproducibility over time\n", " gene=\"ENSMUSG00000015405\",\n", " ensembl=True, # Setting 'ensembl=True' here since the gene is passed as an Ensembl ID\n", " tissue=\"lung\",\n", " species=\"mus_musculus\", # Let's switch up the species\n", ")\n", "\n", "df" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "2QjJEJdS-He7" }, "source": [ "## Use [gget cellxgene](https://pachterlab.github.io/gget/cellxgene.html) from the command line\n", "All gget modules support use from the command line. Note that the command line interface requires the `-o/--out` argument to specify a path to save the fetched data. Here are the command line versions of the queries demonstrated above:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T16:16:45.537414Z", "iopub.status.busy": "2023-07-28T16:16:45.537157Z", "iopub.status.idle": "2023-07-28T16:16:45.539753Z", "shell.execute_reply": "2023-07-28T16:16:45.539253Z" }, "id": "hDcS0fZ--BnB" }, "outputs": [], "source": [ "# # Fetch AnnData object based on specified genes, tissue and cell types\n", "# !gget cellxgene --gene ACE2 ABCA1 SLC5A1 --tissue lung --cell_type 'mucus secreting cell' 'neuroendocrine cell' -o example_adata.h5ad" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2023-07-28T16:16:45.542060Z", "iopub.status.busy": "2023-07-28T16:16:45.541662Z", "iopub.status.idle": "2023-07-28T16:16:45.544200Z", "shell.execute_reply": "2023-07-28T16:16:45.543724Z" }, "id": "f683tvIg-oEz" }, "outputs": [], "source": [ "# # Fetch only metadata\n", "# !gget cellxgene --meta_only --gene ENSMUSG00000015405 --ensembl --tissue lung --species mus_musculus -o example_meta.csv" ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.10" } }, "nbformat": 4, "nbformat_minor": 1 }