{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "c4b089f9-5304-4479-b739-20792e12595a", "metadata": {}, "source": [ "# Out-of-core (incremental) mean and variance calculation\n", "\n", "This tutorial describes use of the cellxgene_census.experimental.pp API for calculating out-of-core mean and variance in the Census. The variance calculation is performed using [Welford's online algorithm](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance).\n", "\n", "**Contents**\n", "\n", "1. The mean and variance API.\n", "2. Example: calculate mean and variance for a slice of the Census.\n", "\n", "## The mean and variance API\n", "\n", "`mean_variance()` calculates the mean and the variance for an `ExperimentAxisQuery`. The following additional arguments are supported:\n", "\n", "- `layer`: the X layer used for the calculation, defaults to `raw`\n", "- `axis`: the axis along which the calculation is performed. Specify 0 for the `var` axis and 1 for the `obs` axis\n", "- `calculate_mean`: if False, do not include the mean in the result\n", "- `calculate_variance`: if False, do not compute the variance.\n", "- `ddof`: _Delta Degrees of Freedom_: the divisor used in the calculation for variance is N - ddof, where N represents the number of elements.\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "a06549db-60c5-49b6-b57d-29ee6db2c61c", "metadata": { "execution": { "iopub.execute_input": "2023-07-28T16:32:46.273919Z", "iopub.status.busy": "2023-07-28T16:32:46.273614Z", "iopub.status.idle": "2023-07-28T16:32:51.270141Z", "shell.execute_reply": "2023-07-28T16:32:51.269514Z" }, "tags": [] }, "outputs": [], "source": [ "# Import packages\n", "import cellxgene_census\n", "import pandas as pd\n", "import tiledbsoma as soma\n", "from cellxgene_census.experimental.pp import mean_variance" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c5385575-d591-4c5c-8bc3-aea4ea02a8da", "metadata": { "tags": [] }, "source": [ "## Example: calculate mean and variance for a slice of the Census" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8dc17793-062b-4ac4-aaa6-32cfe8bb74ac", "metadata": { "tags": [] }, "source": [ "As an example, we'll calculate the mean and variance along the `obs` axis for a subset of cells from the Mouse census.\n", "\n", "The return value will be a Pandas dataframe indexed by `soma_joinid` (in this case, it will be relative to `obs`) and will contain the `mean` and `variance` columns." ] }, { "cell_type": "code", "execution_count": 2, "id": "95d8bbf8-405e-40bd-90d6-366ce20a11bc", "metadata": { "execution": { "iopub.execute_input": "2023-07-28T16:32:51.273381Z", "iopub.status.busy": "2023-07-28T16:32:51.272927Z", "iopub.status.idle": "2023-07-28T16:32:58.442151Z", "shell.execute_reply": "2023-07-28T16:32:58.441609Z" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The \"stable\" release is currently 2023-07-25. Specify 'census_version=\"2023-07-25\"' in future calls to open_soma() to ensure data consistency.\n" ] }, { "data": { "text/html": [ "
\n", " | mean | \n", "variance | \n", "
---|---|---|
soma_joinid | \n", "\n", " | \n", " |
3095357 | \n", "15.915025 | \n", "69571.774917 | \n", "
3095359 | \n", "5.972801 | \n", "9471.427044 | \n", "
3095363 | \n", "25.169472 | \n", "139042.208628 | \n", "
3095366 | \n", "8.049836 | \n", "24762.926397 | \n", "
3095368 | \n", "17.345415 | \n", "150412.440839 | \n", "
... | \n", "... | \n", "... | \n", "
3278898 | \n", "0.164319 | \n", "5.339741 | \n", "
3278899 | \n", "0.368339 | \n", "24.930156 | \n", "
3278900 | \n", "0.246049 | \n", "11.886186 | \n", "
3278901 | \n", "0.240724 | \n", "10.307266 | \n", "
3278902 | \n", "0.278420 | \n", "16.086994 | \n", "
9314 rows × 2 columns
\n", "\n", " | dataset_id | \n", "assay | \n", "assay_ontology_term_id | \n", "cell_type | \n", "cell_type_ontology_term_id | \n", "development_stage | \n", "development_stage_ontology_term_id | \n", "disease | \n", "disease_ontology_term_id | \n", "donor_id | \n", "... | \n", "self_reported_ethnicity_ontology_term_id | \n", "sex | \n", "sex_ontology_term_id | \n", "suspension_type | \n", "tissue | \n", "tissue_ontology_term_id | \n", "tissue_general | \n", "tissue_general_ontology_term_id | \n", "mean | \n", "variance | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
soma_joinid | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
3095357 | \n", "98e5ea9f-16d6-47ec-a529-686e76515e39 | \n", "Smart-seq2 | \n", "EFO:0008931 | \n", "keratinocyte stem cell | \n", "CL:0002337 | \n", "18 month-old stage | \n", "MmusDv:0000089 | \n", "normal | \n", "PATO:0000461 | \n", "18_53_M | \n", "... | \n", "na | \n", "male | \n", "PATO:0000384 | \n", "cell | \n", "skin of body | \n", "UBERON:0002097 | \n", "skin of body | \n", "UBERON:0002097 | \n", "15.915025 | \n", "69571.774917 | \n", "
3095359 | \n", "98e5ea9f-16d6-47ec-a529-686e76515e39 | \n", "Smart-seq2 | \n", "EFO:0008931 | \n", "keratinocyte stem cell | \n", "CL:0002337 | \n", "18 month-old stage | \n", "MmusDv:0000089 | \n", "normal | \n", "PATO:0000461 | \n", "18_47_F | \n", "... | \n", "na | \n", "female | \n", "PATO:0000383 | \n", "cell | \n", "skin of body | \n", "UBERON:0002097 | \n", "skin of body | \n", "UBERON:0002097 | \n", "5.972801 | \n", "9471.427044 | \n", "
3095363 | \n", "98e5ea9f-16d6-47ec-a529-686e76515e39 | \n", "Smart-seq2 | \n", "EFO:0008931 | \n", "keratinocyte stem cell | \n", "CL:0002337 | \n", "18 month-old stage | \n", "MmusDv:0000089 | \n", "normal | \n", "PATO:0000461 | \n", "18_47_F | \n", "... | \n", "na | \n", "female | \n", "PATO:0000383 | \n", "cell | \n", "skin of body | \n", "UBERON:0002097 | \n", "skin of body | \n", "UBERON:0002097 | \n", "25.169472 | \n", "139042.208628 | \n", "
3095366 | \n", "98e5ea9f-16d6-47ec-a529-686e76515e39 | \n", "Smart-seq2 | \n", "EFO:0008931 | \n", "keratinocyte stem cell | \n", "CL:0002337 | \n", "18 month-old stage | \n", "MmusDv:0000089 | \n", "normal | \n", "PATO:0000461 | \n", "18_53_M | \n", "... | \n", "na | \n", "male | \n", "PATO:0000384 | \n", "cell | \n", "skin of body | \n", "UBERON:0002097 | \n", "skin of body | \n", "UBERON:0002097 | \n", "8.049836 | \n", "24762.926397 | \n", "
3095368 | \n", "98e5ea9f-16d6-47ec-a529-686e76515e39 | \n", "Smart-seq2 | \n", "EFO:0008931 | \n", "keratinocyte stem cell | \n", "CL:0002337 | \n", "18 month-old stage | \n", "MmusDv:0000089 | \n", "normal | \n", "PATO:0000461 | \n", "18_47_F | \n", "... | \n", "na | \n", "female | \n", "PATO:0000383 | \n", "cell | \n", "skin of body | \n", "UBERON:0002097 | \n", "skin of body | \n", "UBERON:0002097 | \n", "17.345415 | \n", "150412.440839 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
3278898 | \n", "48b37086-25f7-4ecd-be66-f5bb378e3aea | \n", "10x 3' v2 | \n", "EFO:0009899 | \n", "basal cell of epidermis | \n", "CL:0002187 | \n", "20 month-old stage and over | \n", "MmusDv:0000091 | \n", "normal | \n", "PATO:0000461 | \n", "21-F-55 | \n", "... | \n", "na | \n", "female | \n", "PATO:0000383 | \n", "cell | \n", "skin of body | \n", "UBERON:0002097 | \n", "skin of body | \n", "UBERON:0002097 | \n", "0.164319 | \n", "5.339741 | \n", "
3278899 | \n", "48b37086-25f7-4ecd-be66-f5bb378e3aea | \n", "10x 3' v2 | \n", "EFO:0009899 | \n", "basal cell of epidermis | \n", "CL:0002187 | \n", "20 month-old stage and over | \n", "MmusDv:0000091 | \n", "normal | \n", "PATO:0000461 | \n", "21-F-55 | \n", "... | \n", "na | \n", "female | \n", "PATO:0000383 | \n", "cell | \n", "skin of body | \n", "UBERON:0002097 | \n", "skin of body | \n", "UBERON:0002097 | \n", "0.368339 | \n", "24.930156 | \n", "
3278900 | \n", "48b37086-25f7-4ecd-be66-f5bb378e3aea | \n", "10x 3' v2 | \n", "EFO:0009899 | \n", "epidermal cell | \n", "CL:0000362 | \n", "20 month-old stage and over | \n", "MmusDv:0000091 | \n", "normal | \n", "PATO:0000461 | \n", "21-F-55 | \n", "... | \n", "na | \n", "female | \n", "PATO:0000383 | \n", "cell | \n", "skin of body | \n", "UBERON:0002097 | \n", "skin of body | \n", "UBERON:0002097 | \n", "0.246049 | \n", "11.886186 | \n", "
3278901 | \n", "48b37086-25f7-4ecd-be66-f5bb378e3aea | \n", "10x 3' v2 | \n", "EFO:0009899 | \n", "basal cell of epidermis | \n", "CL:0002187 | \n", "20 month-old stage and over | \n", "MmusDv:0000091 | \n", "normal | \n", "PATO:0000461 | \n", "21-F-55 | \n", "... | \n", "na | \n", "female | \n", "PATO:0000383 | \n", "cell | \n", "skin of body | \n", "UBERON:0002097 | \n", "skin of body | \n", "UBERON:0002097 | \n", "0.240724 | \n", "10.307266 | \n", "
3278902 | \n", "48b37086-25f7-4ecd-be66-f5bb378e3aea | \n", "10x 3' v2 | \n", "EFO:0009899 | \n", "epidermal cell | \n", "CL:0000362 | \n", "20 month-old stage and over | \n", "MmusDv:0000091 | \n", "normal | \n", "PATO:0000461 | \n", "21-F-55 | \n", "... | \n", "na | \n", "female | \n", "PATO:0000383 | \n", "cell | \n", "skin of body | \n", "UBERON:0002097 | \n", "skin of body | \n", "UBERON:0002097 | \n", "0.278420 | \n", "16.086994 | \n", "
9314 rows × 22 columns
\n", "