Out-of-core (incremental) mean and variance calculation

This tutorial describes use of the cellxgene_census.experimental.pp API for calculating out-of-core mean and variance in the Census. The variance calculation is performed using Welford’s online algorithm.

Contents

  1. The mean and variance API.

  2. Example: calculate mean and variance for a slice of the Census.

The mean and variance API

mean_variance() calculates the mean and the variance for an ExperimentAxisQuery. The following additional arguments are supported:

  • layer: the X layer used for the calculation, defaults to raw

  • axis: the axis along which the calculation is performed. Specify 0 for the var axis and 1 for the obs axis

  • calculate_mean: if False, do not include the mean in the result

  • calculate_variance: if False, do not compute the variance.

  • ddof: Delta Degrees of Freedom: the divisor used in the calculation for variance is N - ddof, where N represents the number of elements.

[1]:
# Import packages
import cellxgene_census
import pandas as pd
import tiledbsoma as soma
from cellxgene_census.experimental.pp import mean_variance

Example: calculate mean and variance for a slice of the Census

As an example, we’ll calculate the mean and variance along the obs axis for a subset of cells from the Mouse census.

The return value will be a Pandas dataframe indexed by soma_joinid (in this case, it will be relative to obs) and will contain the mean and variance columns.

[2]:
experiment_name = "mus_musculus"
obs_value_filter = 'is_primary_data == True and tissue_general == "skin of body"'

with cellxgene_census.open_soma(census_version="stable") as census:
    with census["census_data"][experiment_name].axis_query(
        measurement_name="RNA", obs_query=soma.AxisQuery(value_filter=obs_value_filter)
    ) as query:
        mv_df = mean_variance(
            query,
            axis=1,
            calculate_mean=True,
            calculate_variance=True,
        )

        obs_df = query.obs().concat().to_pandas()

mv_df
The "stable" release is currently 2023-07-25. Specify 'census_version="2023-07-25"' in future calls to open_soma() to ensure data consistency.
[2]:
mean variance
soma_joinid
3095357 15.915025 69571.774917
3095359 5.972801 9471.427044
3095363 25.169472 139042.208628
3095366 8.049836 24762.926397
3095368 17.345415 150412.440839
... ... ...
3278898 0.164319 5.339741
3278899 0.368339 24.930156
3278900 0.246049 11.886186
3278901 0.240724 10.307266
3278902 0.278420 16.086994

9314 rows × 2 columns

We can now concatenate the resulting dataframe to obs:

[3]:
combined_df = pd.concat([obs_df.set_index("soma_joinid"), mv_df], axis=1)
combined_df
[3]:
dataset_id assay assay_ontology_term_id cell_type cell_type_ontology_term_id development_stage development_stage_ontology_term_id disease disease_ontology_term_id donor_id ... self_reported_ethnicity_ontology_term_id sex sex_ontology_term_id suspension_type tissue tissue_ontology_term_id tissue_general tissue_general_ontology_term_id mean variance
soma_joinid
3095357 98e5ea9f-16d6-47ec-a529-686e76515e39 Smart-seq2 EFO:0008931 keratinocyte stem cell CL:0002337 18 month-old stage MmusDv:0000089 normal PATO:0000461 18_53_M ... na male PATO:0000384 cell skin of body UBERON:0002097 skin of body UBERON:0002097 15.915025 69571.774917
3095359 98e5ea9f-16d6-47ec-a529-686e76515e39 Smart-seq2 EFO:0008931 keratinocyte stem cell CL:0002337 18 month-old stage MmusDv:0000089 normal PATO:0000461 18_47_F ... na female PATO:0000383 cell skin of body UBERON:0002097 skin of body UBERON:0002097 5.972801 9471.427044
3095363 98e5ea9f-16d6-47ec-a529-686e76515e39 Smart-seq2 EFO:0008931 keratinocyte stem cell CL:0002337 18 month-old stage MmusDv:0000089 normal PATO:0000461 18_47_F ... na female PATO:0000383 cell skin of body UBERON:0002097 skin of body UBERON:0002097 25.169472 139042.208628
3095366 98e5ea9f-16d6-47ec-a529-686e76515e39 Smart-seq2 EFO:0008931 keratinocyte stem cell CL:0002337 18 month-old stage MmusDv:0000089 normal PATO:0000461 18_53_M ... na male PATO:0000384 cell skin of body UBERON:0002097 skin of body UBERON:0002097 8.049836 24762.926397
3095368 98e5ea9f-16d6-47ec-a529-686e76515e39 Smart-seq2 EFO:0008931 keratinocyte stem cell CL:0002337 18 month-old stage MmusDv:0000089 normal PATO:0000461 18_47_F ... na female PATO:0000383 cell skin of body UBERON:0002097 skin of body UBERON:0002097 17.345415 150412.440839
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3278898 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 EFO:0009899 basal cell of epidermis CL:0002187 20 month-old stage and over MmusDv:0000091 normal PATO:0000461 21-F-55 ... na female PATO:0000383 cell skin of body UBERON:0002097 skin of body UBERON:0002097 0.164319 5.339741
3278899 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 EFO:0009899 basal cell of epidermis CL:0002187 20 month-old stage and over MmusDv:0000091 normal PATO:0000461 21-F-55 ... na female PATO:0000383 cell skin of body UBERON:0002097 skin of body UBERON:0002097 0.368339 24.930156
3278900 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 EFO:0009899 epidermal cell CL:0000362 20 month-old stage and over MmusDv:0000091 normal PATO:0000461 21-F-55 ... na female PATO:0000383 cell skin of body UBERON:0002097 skin of body UBERON:0002097 0.246049 11.886186
3278901 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 EFO:0009899 basal cell of epidermis CL:0002187 20 month-old stage and over MmusDv:0000091 normal PATO:0000461 21-F-55 ... na female PATO:0000383 cell skin of body UBERON:0002097 skin of body UBERON:0002097 0.240724 10.307266
3278902 48b37086-25f7-4ecd-be66-f5bb378e3aea 10x 3' v2 EFO:0009899 epidermal cell CL:0000362 20 month-old stage and over MmusDv:0000091 normal PATO:0000461 21-F-55 ... na female PATO:0000383 cell skin of body UBERON:0002097 skin of body UBERON:0002097 0.278420 16.086994

9314 rows × 22 columns