This vignette provides examples for basic axis metadata handling. The CZ CELLxGENE Census stores obs (cell) and var (gene) metadata in a SOMA DataFrame objects via the TileDB-SOMA API (documentation), which can be queried and read as an R data frame. The Census also has a convenience package which simplifies opening the census.

R data frames are in-memory objects. Take care that queries are small enough for results to fit in memory.


  1. Opening the Census
  2. Summarizing cell metadata
    • Example: Summarize all cell types
    • Example: Summarize a subset of cell types, selected with a value_filter
  3. Full Census metadata stats

Opening the Census

The cellxgene.census R package contains a convenient API to open any version of the Census (by default, the newest stable version).

If you open the Census, you should close it with census$close(). This can be automated using on.exit(census$close(), add = TRUE) immediately after census <- open_soma().

You can learn more about the cellxgene.census methods by accessing their corresponding documentation. For example ?cellxgene.census::open_soma.

Summarizing cell metadata

Once the Census is open you can use its TileDB-SOMA methods as it is itself a SOMACollection. You can thus access the metadata SOMADataFrame objects encoding cell and gene metadata.


  • You can read an entire SOMADataFrame into R using$read()$concat()).
  • Queries will be much faster if you request only the DataFrame columns required for your analysis (e.g. column_names = c("soma_joinid", "cell_type_ontology_term_id")).
  • You can also further refine query results by using a value_filter, which will filter the census for matching records.

Example: Summarize all cell types

This example reads the cell metadata (obs) into an R data frame to summarize in a variety of ways.

human <- census$get("census_data")$get("homo_sapiens")

# Read obs into an R data frame (tibble).
obs_df <- human$obs$read(column_names = c("cell_type"))
obs_df <-$concat())

# Find all unique values in the cell_type column.
unique_cell_type <- unique(obs_df$cell_type)

  "There are",
  "cell types in the Census! The first few are: ",
  paste(head(unique_cell_type), collapse = ", ")
#> There are 698 cell types in the Census! The first few are:  plasma cell, mature B cell, macrophage, follicular dendritic cell, plasmacytoid dendritic cell, conventional dendritic cell

Example: Summarize a subset of cell types, selected with a value_filter

This example utilizes a SOMA “value filter” to read the subset of cells with tissue_ontology_term_id equal to UBERON:0002048 (lung tissue), and summarizes the query result.

# Read cell_type terms for cells which have a specific tissue term

obs_df <- human$obs$read(column_names = c("cell_type"), value_filter = paste0("tissue_ontology_term_id == '", LUNG_TISSUE, "'"))
obs_df <-$concat())

# Find all unique values in the cell_type column as an R data frame.
unique_cell_type <- unique(obs_df$cell_type)
  "There are ",
  " cell types in the Census where tissue_ontology_term_id == ",
  "!\nThe first few are:",
  paste(head(unique_cell_type), collapse = ", "),
#> There are  202  cell types in the Census where tissue_ontology_term_id ==  UBERON:0002048 !
#> The first few are: Schwann cell, immature Schwann cell, neuron, Schwann cell precursor, neural progenitor cell, lung ciliated cell

# Report the 10 most common
top_10 <- sort(table(obs_df$cell_type), decreasing = TRUE)[1:10]
  "The top 10 cell types where tissue_ontology_term_id ==",
  "are: ",
  paste(names(top_10), collapse = ", ")
#> The top 10 cell types where tissue_ontology_term_id == UBERON:0002048 are:  unknown, alveolar macrophage, CD8-positive, alpha-beta T cell, CD4-positive, alpha-beta T cell, macrophage, type II pneumocyte, classical monocyte, natural killer cell, malignant cell, epithelial cell of lower respiratory tract

You can also define much more complex value filters. For example:

  • combine terms with & and |
  • use the %in% operator to query on multiple values
# You can also do more complex queries, such as testing for inclusion in a list of values
obs_df <- human$obs$read(
  column_names = c("cell_type_ontology_term_id"),
  value_filter = "tissue_ontology_term_id %in% c('UBERON:0002082', 'UBERON:OOO2084', 'UBERON:0002080')"

obs_df <-$concat())

# Summarize
top_10 <- sort(table(obs_df$cell_type_ontology_term_id), decreasing = TRUE)[1:10]
#> CL:0000746 CL:0008034 CL:0002131 CL:0002548 CL:0000115 CL:0000763 CL:0000057 CL:0000669 
#>     160974      99458      96953      79733      79626      35560      33075      27515 
#>    unknown CL:0002144 
#>      23613      18593

Full Census metadata stats

This example queries all organisms in the Census, and summarizes the diversity of various metadata labels.

cols_to_query <- c(

total_cells <- 0
for (organism in census$get("census_data")$names()) {

  obs_df <- census$get("census_data")$get(organism)$obs$read(column_names = cols_to_query)
  obs_df <-$concat())

  total_cells <- total_cells + nrow(obs_df)
  for (col in cols_to_query) {
    cat("  Unique ", col, " values: ", length(unique(obs_df[[col]])), "\n")
#> [1] "homo_sapiens"
#>   Unique  cell_type_ontology_term_id  values:  698 
#>   Unique  assay_ontology_term_id  values:  24 
#>   Unique  tissue_ontology_term_id  values:  267 
#> [1] "mus_musculus"
#>   Unique  cell_type_ontology_term_id  values:  364 
#>   Unique  assay_ontology_term_id  values:  11 
#>   Unique  tissue_ontology_term_id  values:  84
cat("Complete Census contains ", total_cells, " cells.")
#> Complete Census contains  115556140  cells.

Close the census

After use, the census object should be closed to release memory and other resources.


This also closes all SOMA objects accessed via the top-level census. Closing can be automated using on.exit(census$close(), add = TRUE) immediately after census <- open_soma().