Out-of-core (incremental) mean and variance calculation
This tutorial describes use of the cellxgene_census.experimental.pp API for calculating out-of-core mean and variance in the Census. The variance calculation is performed using Welford’s online algorithm.
Contents
The mean and variance API.
Example: calculate mean and variance for a slice of the Census.
The mean and variance API
mean_variance()
calculates the mean and the variance for an ExperimentAxisQuery
. The following additional arguments are supported:
layer
: the X layer used for the calculation, defaults toraw
axis
: the axis along which the calculation is performed. Specify 0 for thevar
axis and 1 for theobs
axiscalculate_mean
: if False, do not include the mean in the resultcalculate_variance
: if False, do not compute the variance.ddof
: Delta Degrees of Freedom: the divisor used in the calculation for variance is N - ddof, where N represents the number of elements.
[1]:
# Import packages
import cellxgene_census
import pandas as pd
import tiledbsoma as soma
from cellxgene_census.experimental.pp import mean_variance
Example: calculate mean and variance for a slice of the Census
As an example, we’ll calculate the mean and variance along the obs
axis for a subset of cells from the Mouse census.
The return value will be a Pandas dataframe indexed by soma_joinid
(in this case, it will be relative to obs
) and will contain the mean
and variance
columns.
[2]:
experiment_name = "mus_musculus"
obs_value_filter = 'is_primary_data == True and tissue_general == "skin of body"'
with cellxgene_census.open_soma(census_version="stable") as census:
with census["census_data"][experiment_name].axis_query(
measurement_name="RNA", obs_query=soma.AxisQuery(value_filter=obs_value_filter)
) as query:
mv_df = mean_variance(
query,
axis=1,
calculate_mean=True,
calculate_variance=True,
)
obs_df = query.obs().concat().to_pandas()
mv_df
The "stable" release is currently 2023-07-25. Specify 'census_version="2023-07-25"' in future calls to open_soma() to ensure data consistency.
[2]:
mean | variance | |
---|---|---|
soma_joinid | ||
3095357 | 15.915025 | 69571.774917 |
3095359 | 5.972801 | 9471.427044 |
3095363 | 25.169472 | 139042.208628 |
3095366 | 8.049836 | 24762.926397 |
3095368 | 17.345415 | 150412.440839 |
... | ... | ... |
3278898 | 0.164319 | 5.339741 |
3278899 | 0.368339 | 24.930156 |
3278900 | 0.246049 | 11.886186 |
3278901 | 0.240724 | 10.307266 |
3278902 | 0.278420 | 16.086994 |
9314 rows × 2 columns
We can now concatenate the resulting dataframe to obs
:
[3]:
combined_df = pd.concat([obs_df.set_index("soma_joinid"), mv_df], axis=1)
combined_df
[3]:
dataset_id | assay | assay_ontology_term_id | cell_type | cell_type_ontology_term_id | development_stage | development_stage_ontology_term_id | disease | disease_ontology_term_id | donor_id | ... | self_reported_ethnicity_ontology_term_id | sex | sex_ontology_term_id | suspension_type | tissue | tissue_ontology_term_id | tissue_general | tissue_general_ontology_term_id | mean | variance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
soma_joinid | |||||||||||||||||||||
3095357 | 98e5ea9f-16d6-47ec-a529-686e76515e39 | Smart-seq2 | EFO:0008931 | keratinocyte stem cell | CL:0002337 | 18 month-old stage | MmusDv:0000089 | normal | PATO:0000461 | 18_53_M | ... | na | male | PATO:0000384 | cell | skin of body | UBERON:0002097 | skin of body | UBERON:0002097 | 15.915025 | 69571.774917 |
3095359 | 98e5ea9f-16d6-47ec-a529-686e76515e39 | Smart-seq2 | EFO:0008931 | keratinocyte stem cell | CL:0002337 | 18 month-old stage | MmusDv:0000089 | normal | PATO:0000461 | 18_47_F | ... | na | female | PATO:0000383 | cell | skin of body | UBERON:0002097 | skin of body | UBERON:0002097 | 5.972801 | 9471.427044 |
3095363 | 98e5ea9f-16d6-47ec-a529-686e76515e39 | Smart-seq2 | EFO:0008931 | keratinocyte stem cell | CL:0002337 | 18 month-old stage | MmusDv:0000089 | normal | PATO:0000461 | 18_47_F | ... | na | female | PATO:0000383 | cell | skin of body | UBERON:0002097 | skin of body | UBERON:0002097 | 25.169472 | 139042.208628 |
3095366 | 98e5ea9f-16d6-47ec-a529-686e76515e39 | Smart-seq2 | EFO:0008931 | keratinocyte stem cell | CL:0002337 | 18 month-old stage | MmusDv:0000089 | normal | PATO:0000461 | 18_53_M | ... | na | male | PATO:0000384 | cell | skin of body | UBERON:0002097 | skin of body | UBERON:0002097 | 8.049836 | 24762.926397 |
3095368 | 98e5ea9f-16d6-47ec-a529-686e76515e39 | Smart-seq2 | EFO:0008931 | keratinocyte stem cell | CL:0002337 | 18 month-old stage | MmusDv:0000089 | normal | PATO:0000461 | 18_47_F | ... | na | female | PATO:0000383 | cell | skin of body | UBERON:0002097 | skin of body | UBERON:0002097 | 17.345415 | 150412.440839 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3278898 | 48b37086-25f7-4ecd-be66-f5bb378e3aea | 10x 3' v2 | EFO:0009899 | basal cell of epidermis | CL:0002187 | 20 month-old stage and over | MmusDv:0000091 | normal | PATO:0000461 | 21-F-55 | ... | na | female | PATO:0000383 | cell | skin of body | UBERON:0002097 | skin of body | UBERON:0002097 | 0.164319 | 5.339741 |
3278899 | 48b37086-25f7-4ecd-be66-f5bb378e3aea | 10x 3' v2 | EFO:0009899 | basal cell of epidermis | CL:0002187 | 20 month-old stage and over | MmusDv:0000091 | normal | PATO:0000461 | 21-F-55 | ... | na | female | PATO:0000383 | cell | skin of body | UBERON:0002097 | skin of body | UBERON:0002097 | 0.368339 | 24.930156 |
3278900 | 48b37086-25f7-4ecd-be66-f5bb378e3aea | 10x 3' v2 | EFO:0009899 | epidermal cell | CL:0000362 | 20 month-old stage and over | MmusDv:0000091 | normal | PATO:0000461 | 21-F-55 | ... | na | female | PATO:0000383 | cell | skin of body | UBERON:0002097 | skin of body | UBERON:0002097 | 0.246049 | 11.886186 |
3278901 | 48b37086-25f7-4ecd-be66-f5bb378e3aea | 10x 3' v2 | EFO:0009899 | basal cell of epidermis | CL:0002187 | 20 month-old stage and over | MmusDv:0000091 | normal | PATO:0000461 | 21-F-55 | ... | na | female | PATO:0000383 | cell | skin of body | UBERON:0002097 | skin of body | UBERON:0002097 | 0.240724 | 10.307266 |
3278902 | 48b37086-25f7-4ecd-be66-f5bb378e3aea | 10x 3' v2 | EFO:0009899 | epidermal cell | CL:0000362 | 20 month-old stage and over | MmusDv:0000091 | normal | PATO:0000461 | 21-F-55 | ... | na | female | PATO:0000383 | cell | skin of body | UBERON:0002097 | skin of body | UBERON:0002097 | 0.278420 | 16.086994 |
9314 rows × 22 columns