Skip to contents

This notebook demonstrates how to generate a citation string for all datasets contained in a census slice.

Contents

  1. Requirements
  2. Generating citation strings
    1. Via cell metadata query
    2. Via Seurat query
    3. Via SingleCellExperiment query

⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable is_primary_data which is described in the Census schema.

Requirements

This notebook requires:

  • cellxgene_census Python package.
  • Census data release with schema version 1.3.0 or greater.

Generating citation strings

First we open a handle to the Census data. To ensure we open a data release with schema version 1.3.0 or greater, we use census_version="latest"

library("tiledb")
library("cellxgene.census")

census <- open_soma(census_version = "latest")
census_release_info <- census$get("census_info")$get("summary")$read()$concat()
as.data.frame(census_release_info)
#>   soma_joinid                      label      value
#> 1           0      census_schema_version      2.0.0
#> 2           1          census_build_date 2024-04-15
#> 3           2     dataset_schema_version      5.0.0
#> 4           3           total_cell_count  114567967
#> 5           4          unique_cell_count   59902365
#> 6           5 number_donors_homo_sapiens      17132
#> 7           6 number_donors_mus_musculus       4186

Then we load the dataset table which contains a column "citation" for each dataset included in Census.

datasets <- census$get("census_info")$get("datasets")$read()$concat()
datasets <- as.data.frame(datasets)
head(datasets["citation"])
#>                                                                                                                                                                                                                                                                                                           citation
#> 1            Publication: https://doi.org/10.1002/hep4.1854 Dataset Version: https://datasets.cellxgene.cziscience.com/fb76c95f-0391-4fac-9fb9-082ce2430b59.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/44531dd9-1388-4416-a117-af0a99de2294
#> 2   Publication: https://doi.org/10.1126/sciimmunol.abe6291 Dataset Version: https://datasets.cellxgene.cziscience.com/b6737a5e-9069-4dd6-9a57-92e17a746df9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/3a2af25b-2338-4266-aad3-aa8d07473f50
#> 3   Publication: https://doi.org/10.1038/s41593-020-00764-7 Dataset Version: https://datasets.cellxgene.cziscience.com/0e02290f-b992-450b-8a19-554f73cd7f09.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/180bff9c-c8a5-4539-b13b-ddbc00d643e6
#> 4   Publication: https://doi.org/10.1038/s41467-022-29450-x Dataset Version: https://datasets.cellxgene.cziscience.com/40832710-d7b1-43fb-b2c2-1cd2255bc3ac.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/bf325905-5e8e-42e3-933d-9a9053e9af80
#> 5   Publication: https://doi.org/10.1038/s41590-021-01059-0 Dataset Version: https://datasets.cellxgene.cziscience.com/eb6c070c-ff67-4c1f-8d4d-65f9fe2119ee.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/93eebe82-d8c3-41bc-a906-63b5b5f24a9d
#> 6 Publication: https://doi.org/10.1016/j.celrep.2019.12.082 Dataset Version: https://datasets.cellxgene.cziscience.com/650a47be-6666-4f70-ac47-8414c50bbd8e.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/939769a8-d8d2-4d01-abfc-55699893fd49

And now we can use the column "dataset_id" present in both the dataset table and the Census cell metadata to create citation strings for any Census slice.

Via cell metadata query

# Query cell metadata
cell_metadata <- census$get("census_data")$get("homo_sapiens")$obs$read(
  value_filter = "tissue == 'cardiac atrium'",
  column_names = c("dataset_id", "cell_type")
)

cell_metadata <- as.data.frame(cell_metadata$concat())

# Get a citation string for the slice
slice_datasets <- datasets[datasets$dataset_id %in% cell_metadata$dataset_id, ]
print(slice_datasets$citation)
#> [1] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/9227d155-6f2d-4534-be73-b86c5c34d8e6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [2] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/017c9ef2-a5e5-429e-a9a1-919e330c4087.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [3] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8c189c08-4eba-45d4-925f-a5fe1a13d2ae.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [4] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b76d37f6-0654-447f-bd1b-477be2c747f9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [5] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/860c49d4-8ab1-4576-b67e-02d66e4a6ddd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [6] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b84def55-a776-4aa4-a9a6-7aab8b973086.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"

Via Seurat query

# Fetch a Seurat object
seurat_obj <- get_seurat(
  census = census,
  organism = "homo_sapiens",
  measurement_name = "RNA",
  obs_value_filter = "tissue == 'cardiac atrium'",
  var_value_filter = "feature_name == 'MYBPC3'",
  obs_column_names = c("dataset_id", "cell_type")
)

# Get a citation string for the slice
slice_datasets <- datasets[datasets$dataset_id %in% seurat_obj[[]]$dataset_id, ]
print(slice_datasets$citation)
#> [1] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/9227d155-6f2d-4534-be73-b86c5c34d8e6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [2] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/017c9ef2-a5e5-429e-a9a1-919e330c4087.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [3] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8c189c08-4eba-45d4-925f-a5fe1a13d2ae.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [4] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b76d37f6-0654-447f-bd1b-477be2c747f9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [5] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/860c49d4-8ab1-4576-b67e-02d66e4a6ddd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [6] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b84def55-a776-4aa4-a9a6-7aab8b973086.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"

Via SingleCellExperiment query

# Fetch a Seurat object
sce_obj <- get_single_cell_experiment(
  census = census,
  organism = "homo_sapiens",
  measurement_name = "RNA",
  obs_value_filter = "tissue == 'cardiac atrium'",
  var_value_filter = "feature_name == 'MYBPC3'",
  obs_column_names = c("dataset_id", "cell_type")
)

# Get a citation string for the slice
slice_datasets <- datasets[datasets$dataset_id %in% sce_obj$dataset_id, ]
print(slice_datasets$citation)
#> [1] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/9227d155-6f2d-4534-be73-b86c5c34d8e6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [2] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/017c9ef2-a5e5-429e-a9a1-919e330c4087.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [3] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8c189c08-4eba-45d4-925f-a5fe1a13d2ae.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [4] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b76d37f6-0654-447f-bd1b-477be2c747f9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [5] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/860c49d4-8ab1-4576-b67e-02d66e4a6ddd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [6] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b84def55-a776-4aa4-a9a6-7aab8b973086.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"