{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "c4b089f9-5304-4479-b739-20792e12595a", "metadata": {}, "source": [ "# Out-of-core (incremental) mean and variance calculation\n", "\n", "This tutorial describes use of the cellxgene_census.experimental.pp API for calculating out-of-core mean and variance in the Census. The variance calculation is performed using [Welford's online algorithm](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance).\n", "\n", "**Contents**\n", "\n", "1. The mean and variance API.\n", "2. Example: calculate mean and variance for a slice of the Census.\n", "\n", "## The mean and variance API\n", "\n", "`mean_variance()` calculates the mean and the variance for an `ExperimentAxisQuery`. The following additional arguments are supported:\n", "\n", "- `layer`: the X layer used for the calculation, defaults to `raw`\n", "- `axis`: the axis along which the calculation is performed. Specify 0 for the `var` axis and 1 for the `obs` axis\n", "- `calculate_mean`: if False, do not include the mean in the result\n", "- `calculate_variance`: if False, do not compute the variance.\n", "- `ddof`: _Delta Degrees of Freedom_: the divisor used in the calculation for variance is N - ddof, where N represents the number of elements.\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "a06549db-60c5-49b6-b57d-29ee6db2c61c", "metadata": { "execution": { "iopub.execute_input": "2023-07-28T16:32:46.273919Z", "iopub.status.busy": "2023-07-28T16:32:46.273614Z", "iopub.status.idle": "2023-07-28T16:32:51.270141Z", "shell.execute_reply": "2023-07-28T16:32:51.269514Z" }, "tags": [] }, "outputs": [], "source": [ "# Import packages\n", "import cellxgene_census\n", "import pandas as pd\n", "import tiledbsoma as soma\n", "from cellxgene_census.experimental.pp import mean_variance" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c5385575-d591-4c5c-8bc3-aea4ea02a8da", "metadata": { "tags": [] }, "source": [ "## Example: calculate mean and variance for a slice of the Census" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8dc17793-062b-4ac4-aaa6-32cfe8bb74ac", "metadata": { "tags": [] }, "source": [ "As an example, we'll calculate the mean and variance along the `obs` axis for a subset of cells from the Mouse census.\n", "\n", "The return value will be a Pandas dataframe indexed by `soma_joinid` (in this case, it will be relative to `obs`) and will contain the `mean` and `variance` columns." ] }, { "cell_type": "code", "execution_count": 2, "id": "95d8bbf8-405e-40bd-90d6-366ce20a11bc", "metadata": { "execution": { "iopub.execute_input": "2023-07-28T16:32:51.273381Z", "iopub.status.busy": "2023-07-28T16:32:51.272927Z", "iopub.status.idle": "2023-07-28T16:32:58.442151Z", "shell.execute_reply": "2023-07-28T16:32:58.441609Z" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The \"stable\" release is currently 2023-07-25. Specify 'census_version=\"2023-07-25\"' in future calls to open_soma() to ensure data consistency.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meanvariance
soma_joinid
309535715.91502569571.774917
30953595.9728019471.427044
309536325.169472139042.208628
30953668.04983624762.926397
309536817.345415150412.440839
.........
32788980.1643195.339741
32788990.36833924.930156
32789000.24604911.886186
32789010.24072410.307266
32789020.27842016.086994
\n", "

9314 rows × 2 columns

\n", "
" ], "text/plain": [ " mean variance\n", "soma_joinid \n", "3095357 15.915025 69571.774917\n", "3095359 5.972801 9471.427044\n", "3095363 25.169472 139042.208628\n", "3095366 8.049836 24762.926397\n", "3095368 17.345415 150412.440839\n", "... ... ...\n", "3278898 0.164319 5.339741\n", "3278899 0.368339 24.930156\n", "3278900 0.246049 11.886186\n", "3278901 0.240724 10.307266\n", "3278902 0.278420 16.086994\n", "\n", "[9314 rows x 2 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "experiment_name = \"mus_musculus\"\n", "obs_value_filter = 'is_primary_data == True and tissue_general == \"skin of body\"'\n", "\n", "with cellxgene_census.open_soma(census_version=\"stable\") as census:\n", " with census[\"census_data\"][experiment_name].axis_query(\n", " measurement_name=\"RNA\", obs_query=soma.AxisQuery(value_filter=obs_value_filter)\n", " ) as query:\n", " mv_df = mean_variance(\n", " query,\n", " axis=1,\n", " calculate_mean=True,\n", " calculate_variance=True,\n", " )\n", "\n", " obs_df = query.obs().concat().to_pandas()\n", "\n", "mv_df" ] }, { "attachments": {}, "cell_type": "markdown", "id": "efe6863e-9166-47ec-b096-f4d12c0cbda1", "metadata": { "tags": [] }, "source": [ "We can now concatenate the resulting dataframe to `obs`:" ] }, { "cell_type": "code", "execution_count": 3, "id": "b0f6f254-852d-4a0a-97af-e00a77b415c4", "metadata": { "execution": { "iopub.execute_input": "2023-07-28T16:32:58.444770Z", "iopub.status.busy": "2023-07-28T16:32:58.444502Z", "iopub.status.idle": "2023-07-28T16:32:58.468958Z", "shell.execute_reply": "2023-07-28T16:32:58.468442Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dataset_idassayassay_ontology_term_idcell_typecell_type_ontology_term_iddevelopment_stagedevelopment_stage_ontology_term_iddiseasedisease_ontology_term_iddonor_id...self_reported_ethnicity_ontology_term_idsexsex_ontology_term_idsuspension_typetissuetissue_ontology_term_idtissue_generaltissue_general_ontology_term_idmeanvariance
soma_joinid
309535798e5ea9f-16d6-47ec-a529-686e76515e39Smart-seq2EFO:0008931keratinocyte stem cellCL:000233718 month-old stageMmusDv:0000089normalPATO:000046118_53_M...namalePATO:0000384cellskin of bodyUBERON:0002097skin of bodyUBERON:000209715.91502569571.774917
309535998e5ea9f-16d6-47ec-a529-686e76515e39Smart-seq2EFO:0008931keratinocyte stem cellCL:000233718 month-old stageMmusDv:0000089normalPATO:000046118_47_F...nafemalePATO:0000383cellskin of bodyUBERON:0002097skin of bodyUBERON:00020975.9728019471.427044
309536398e5ea9f-16d6-47ec-a529-686e76515e39Smart-seq2EFO:0008931keratinocyte stem cellCL:000233718 month-old stageMmusDv:0000089normalPATO:000046118_47_F...nafemalePATO:0000383cellskin of bodyUBERON:0002097skin of bodyUBERON:000209725.169472139042.208628
309536698e5ea9f-16d6-47ec-a529-686e76515e39Smart-seq2EFO:0008931keratinocyte stem cellCL:000233718 month-old stageMmusDv:0000089normalPATO:000046118_53_M...namalePATO:0000384cellskin of bodyUBERON:0002097skin of bodyUBERON:00020978.04983624762.926397
309536898e5ea9f-16d6-47ec-a529-686e76515e39Smart-seq2EFO:0008931keratinocyte stem cellCL:000233718 month-old stageMmusDv:0000089normalPATO:000046118_47_F...nafemalePATO:0000383cellskin of bodyUBERON:0002097skin of bodyUBERON:000209717.345415150412.440839
..................................................................
327889848b37086-25f7-4ecd-be66-f5bb378e3aea10x 3' v2EFO:0009899basal cell of epidermisCL:000218720 month-old stage and overMmusDv:0000091normalPATO:000046121-F-55...nafemalePATO:0000383cellskin of bodyUBERON:0002097skin of bodyUBERON:00020970.1643195.339741
327889948b37086-25f7-4ecd-be66-f5bb378e3aea10x 3' v2EFO:0009899basal cell of epidermisCL:000218720 month-old stage and overMmusDv:0000091normalPATO:000046121-F-55...nafemalePATO:0000383cellskin of bodyUBERON:0002097skin of bodyUBERON:00020970.36833924.930156
327890048b37086-25f7-4ecd-be66-f5bb378e3aea10x 3' v2EFO:0009899epidermal cellCL:000036220 month-old stage and overMmusDv:0000091normalPATO:000046121-F-55...nafemalePATO:0000383cellskin of bodyUBERON:0002097skin of bodyUBERON:00020970.24604911.886186
327890148b37086-25f7-4ecd-be66-f5bb378e3aea10x 3' v2EFO:0009899basal cell of epidermisCL:000218720 month-old stage and overMmusDv:0000091normalPATO:000046121-F-55...nafemalePATO:0000383cellskin of bodyUBERON:0002097skin of bodyUBERON:00020970.24072410.307266
327890248b37086-25f7-4ecd-be66-f5bb378e3aea10x 3' v2EFO:0009899epidermal cellCL:000036220 month-old stage and overMmusDv:0000091normalPATO:000046121-F-55...nafemalePATO:0000383cellskin of bodyUBERON:0002097skin of bodyUBERON:00020970.27842016.086994
\n", "

9314 rows × 22 columns

\n", "
" ], "text/plain": [ " dataset_id assay \\\n", "soma_joinid \n", "3095357 98e5ea9f-16d6-47ec-a529-686e76515e39 Smart-seq2 \n", "3095359 98e5ea9f-16d6-47ec-a529-686e76515e39 Smart-seq2 \n", "3095363 98e5ea9f-16d6-47ec-a529-686e76515e39 Smart-seq2 \n", "3095366 98e5ea9f-16d6-47ec-a529-686e76515e39 Smart-seq2 \n", "3095368 98e5ea9f-16d6-47ec-a529-686e76515e39 Smart-seq2 \n", "... ... ... \n", "3278898 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 \n", "3278899 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 \n", "3278900 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 \n", "3278901 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 \n", "3278902 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 \n", "\n", " assay_ontology_term_id cell_type \\\n", "soma_joinid \n", "3095357 EFO:0008931 keratinocyte stem cell \n", "3095359 EFO:0008931 keratinocyte stem cell \n", "3095363 EFO:0008931 keratinocyte stem cell \n", "3095366 EFO:0008931 keratinocyte stem cell \n", "3095368 EFO:0008931 keratinocyte stem cell \n", "... ... ... \n", "3278898 EFO:0009899 basal cell of epidermis \n", "3278899 EFO:0009899 basal cell of epidermis \n", "3278900 EFO:0009899 epidermal cell \n", "3278901 EFO:0009899 basal cell of epidermis \n", "3278902 EFO:0009899 epidermal cell \n", "\n", " cell_type_ontology_term_id development_stage \\\n", "soma_joinid \n", "3095357 CL:0002337 18 month-old stage \n", "3095359 CL:0002337 18 month-old stage \n", "3095363 CL:0002337 18 month-old stage \n", "3095366 CL:0002337 18 month-old stage \n", "3095368 CL:0002337 18 month-old stage \n", "... ... ... \n", "3278898 CL:0002187 20 month-old stage and over \n", "3278899 CL:0002187 20 month-old stage and over \n", "3278900 CL:0000362 20 month-old stage and over \n", "3278901 CL:0002187 20 month-old stage and over \n", "3278902 CL:0000362 20 month-old stage and over \n", "\n", " development_stage_ontology_term_id disease \\\n", "soma_joinid \n", "3095357 MmusDv:0000089 normal \n", "3095359 MmusDv:0000089 normal \n", "3095363 MmusDv:0000089 normal \n", "3095366 MmusDv:0000089 normal \n", "3095368 MmusDv:0000089 normal \n", "... ... ... \n", "3278898 MmusDv:0000091 normal \n", "3278899 MmusDv:0000091 normal \n", "3278900 MmusDv:0000091 normal \n", "3278901 MmusDv:0000091 normal \n", "3278902 MmusDv:0000091 normal \n", "\n", " disease_ontology_term_id donor_id ... \\\n", "soma_joinid ... \n", "3095357 PATO:0000461 18_53_M ... \n", "3095359 PATO:0000461 18_47_F ... \n", "3095363 PATO:0000461 18_47_F ... \n", "3095366 PATO:0000461 18_53_M ... \n", "3095368 PATO:0000461 18_47_F ... \n", "... ... ... ... \n", "3278898 PATO:0000461 21-F-55 ... \n", "3278899 PATO:0000461 21-F-55 ... \n", "3278900 PATO:0000461 21-F-55 ... \n", "3278901 PATO:0000461 21-F-55 ... \n", "3278902 PATO:0000461 21-F-55 ... \n", "\n", " self_reported_ethnicity_ontology_term_id sex \\\n", "soma_joinid \n", "3095357 na male \n", "3095359 na female \n", "3095363 na female \n", "3095366 na male \n", "3095368 na female \n", "... ... ... \n", "3278898 na female \n", "3278899 na female \n", "3278900 na female \n", "3278901 na female \n", "3278902 na female \n", "\n", " sex_ontology_term_id suspension_type tissue \\\n", "soma_joinid \n", "3095357 PATO:0000384 cell skin of body \n", "3095359 PATO:0000383 cell skin of body \n", "3095363 PATO:0000383 cell skin of body \n", "3095366 PATO:0000384 cell skin of body \n", "3095368 PATO:0000383 cell skin of body \n", "... ... ... ... \n", "3278898 PATO:0000383 cell skin of body \n", "3278899 PATO:0000383 cell skin of body \n", "3278900 PATO:0000383 cell skin of body \n", "3278901 PATO:0000383 cell skin of body \n", "3278902 PATO:0000383 cell skin of body \n", "\n", " tissue_ontology_term_id tissue_general \\\n", "soma_joinid \n", "3095357 UBERON:0002097 skin of body \n", "3095359 UBERON:0002097 skin of body \n", "3095363 UBERON:0002097 skin of body \n", "3095366 UBERON:0002097 skin of body \n", "3095368 UBERON:0002097 skin of body \n", "... ... ... \n", "3278898 UBERON:0002097 skin of body \n", "3278899 UBERON:0002097 skin of body \n", "3278900 UBERON:0002097 skin of body \n", "3278901 UBERON:0002097 skin of body \n", "3278902 UBERON:0002097 skin of body \n", "\n", " tissue_general_ontology_term_id mean variance \n", "soma_joinid \n", "3095357 UBERON:0002097 15.915025 69571.774917 \n", "3095359 UBERON:0002097 5.972801 9471.427044 \n", "3095363 UBERON:0002097 25.169472 139042.208628 \n", "3095366 UBERON:0002097 8.049836 24762.926397 \n", "3095368 UBERON:0002097 17.345415 150412.440839 \n", "... ... ... ... \n", "3278898 UBERON:0002097 0.164319 5.339741 \n", "3278899 UBERON:0002097 0.368339 24.930156 \n", "3278900 UBERON:0002097 0.246049 11.886186 \n", "3278901 UBERON:0002097 0.240724 10.307266 \n", "3278902 UBERON:0002097 0.278420 16.086994 \n", "\n", "[9314 rows x 22 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "combined_df = pd.concat([obs_df.set_index(\"soma_joinid\"), mv_df], axis=1)\n", "combined_df" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.10" } }, "nbformat": 4, "nbformat_minor": 5 }