Gene expression values are pulled from
anndata.X. These feed into the histograms, scatterplot, colorscale, and differential expression calculations. We’re working on ways to incorporate
anndata.raw and other
Categorical (e.g., cluster labels) and continuous (e.g., pseudotime) metadata are pulled from
anndata.obs. Any column added here will be available for visualization in cellxgene. You can also create new categorical annotations within the application.
cellxgene looks for embeddings (e.g., tSNE, UMAP, PCA, spatial coordinates) in
anndata.obsm. These fields must follow the scanpy convention of starting with
anndata.obsm['X_umap']. If an embedding has more than two components, the first two will be used for visualization.
We’re actively working on how to improve differential expression within the app. N.B.: the current implementation assumes normally distributed values on a linear scale.
Currently, we use a Welch’s t-test, which assumes that the two populations are each normally distributed, but may have unequal variance. We use a two-sided t-test against the null hypothesis that the two populations have equal means. P-values are adjusted with the Bonferroni corrrection.
To help avoid spurious results, we use the log fold change to filter genes, retaining those where
|log2( mean(set1) / mean(set2) )| > 0.01; this threshold can be configured with the
--diffexp-lfc-cutoff command. We then sort genes by their associated
|t value| and return the top 15 genes.