Generating citations for Census slices
Source:vignettes/census_citation_generation.Rmd
census_citation_generation.Rmd
This notebook demonstrates how to generate a citation string for all datasets contained in a census slice.
Contents
- Requirements
- Generating citation strings
- Via cell metadata query
- Via Seurat query
- Via SingleCellExperiment query
⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable is_primary_data
which is described in the Census schema.
Requirements
This notebook requires:
-
cellxgene_census
Python package. - Census data release with schema version 1.3.0 or greater.
Generating citation strings
First we open a handle to the Census data. To ensure we open a data release with schema version 1.3.0 or greater, we use census_version="latest"
library("tiledb")
library("cellxgene.census")
census <- open_soma(census_version = "latest")
census_release_info <- census$get("census_info")$get("summary")$read()$concat()
as.data.frame(census_release_info)
#> soma_joinid label value
#> 1 0 census_schema_version 2.0.0
#> 2 1 census_build_date 2024-04-15
#> 3 2 dataset_schema_version 5.0.0
#> 4 3 total_cell_count 114567967
#> 5 4 unique_cell_count 59902365
#> 6 5 number_donors_homo_sapiens 17132
#> 7 6 number_donors_mus_musculus 4186
Then we load the dataset table which contains a column "citation"
for each dataset included in Census.
datasets <- census$get("census_info")$get("datasets")$read()$concat()
datasets <- as.data.frame(datasets)
head(datasets["citation"])
#> citation
#> 1 Publication: https://doi.org/10.1002/hep4.1854 Dataset Version: https://datasets.cellxgene.cziscience.com/fb76c95f-0391-4fac-9fb9-082ce2430b59.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/44531dd9-1388-4416-a117-af0a99de2294
#> 2 Publication: https://doi.org/10.1126/sciimmunol.abe6291 Dataset Version: https://datasets.cellxgene.cziscience.com/b6737a5e-9069-4dd6-9a57-92e17a746df9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/3a2af25b-2338-4266-aad3-aa8d07473f50
#> 3 Publication: https://doi.org/10.1038/s41593-020-00764-7 Dataset Version: https://datasets.cellxgene.cziscience.com/0e02290f-b992-450b-8a19-554f73cd7f09.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/180bff9c-c8a5-4539-b13b-ddbc00d643e6
#> 4 Publication: https://doi.org/10.1038/s41467-022-29450-x Dataset Version: https://datasets.cellxgene.cziscience.com/40832710-d7b1-43fb-b2c2-1cd2255bc3ac.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/bf325905-5e8e-42e3-933d-9a9053e9af80
#> 5 Publication: https://doi.org/10.1038/s41590-021-01059-0 Dataset Version: https://datasets.cellxgene.cziscience.com/eb6c070c-ff67-4c1f-8d4d-65f9fe2119ee.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/93eebe82-d8c3-41bc-a906-63b5b5f24a9d
#> 6 Publication: https://doi.org/10.1016/j.celrep.2019.12.082 Dataset Version: https://datasets.cellxgene.cziscience.com/650a47be-6666-4f70-ac47-8414c50bbd8e.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/939769a8-d8d2-4d01-abfc-55699893fd49
And now we can use the column "dataset_id"
present in both the dataset table and the Census cell metadata to create citation strings for any Census slice.
Via cell metadata query
# Query cell metadata
cell_metadata <- census$get("census_data")$get("homo_sapiens")$obs$read(
value_filter = "tissue == 'cardiac atrium'",
column_names = c("dataset_id", "cell_type")
)
cell_metadata <- as.data.frame(cell_metadata$concat())
# Get a citation string for the slice
slice_datasets <- datasets[datasets$dataset_id %in% cell_metadata$dataset_id, ]
print(slice_datasets$citation)
#> [1] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/9227d155-6f2d-4534-be73-b86c5c34d8e6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [2] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/017c9ef2-a5e5-429e-a9a1-919e330c4087.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [3] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8c189c08-4eba-45d4-925f-a5fe1a13d2ae.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [4] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b76d37f6-0654-447f-bd1b-477be2c747f9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [5] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/860c49d4-8ab1-4576-b67e-02d66e4a6ddd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [6] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b84def55-a776-4aa4-a9a6-7aab8b973086.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
Via Seurat query
# Fetch a Seurat object
seurat_obj <- get_seurat(
census = census,
organism = "homo_sapiens",
measurement_name = "RNA",
obs_value_filter = "tissue == 'cardiac atrium'",
var_value_filter = "feature_name == 'MYBPC3'",
obs_column_names = c("dataset_id", "cell_type")
)
# Get a citation string for the slice
slice_datasets <- datasets[datasets$dataset_id %in% seurat_obj[[]]$dataset_id, ]
print(slice_datasets$citation)
#> [1] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/9227d155-6f2d-4534-be73-b86c5c34d8e6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [2] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/017c9ef2-a5e5-429e-a9a1-919e330c4087.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [3] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8c189c08-4eba-45d4-925f-a5fe1a13d2ae.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [4] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b76d37f6-0654-447f-bd1b-477be2c747f9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [5] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/860c49d4-8ab1-4576-b67e-02d66e4a6ddd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [6] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b84def55-a776-4aa4-a9a6-7aab8b973086.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
Via SingleCellExperiment query
# Fetch a Seurat object
sce_obj <- get_single_cell_experiment(
census = census,
organism = "homo_sapiens",
measurement_name = "RNA",
obs_value_filter = "tissue == 'cardiac atrium'",
var_value_filter = "feature_name == 'MYBPC3'",
obs_column_names = c("dataset_id", "cell_type")
)
# Get a citation string for the slice
slice_datasets <- datasets[datasets$dataset_id %in% sce_obj$dataset_id, ]
print(slice_datasets$citation)
#> [1] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/9227d155-6f2d-4534-be73-b86c5c34d8e6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [2] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/017c9ef2-a5e5-429e-a9a1-919e330c4087.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [3] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8c189c08-4eba-45d4-925f-a5fe1a13d2ae.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [4] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b76d37f6-0654-447f-bd1b-477be2c747f9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [5] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/860c49d4-8ab1-4576-b67e-02d66e4a6ddd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [6] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b84def55-a776-4aa4-a9a6-7aab8b973086.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"