Generating citations for Census slices
Source:vignettes/census_citation_generation.Rmd
census_citation_generation.Rmd
This notebook demonstrates how to generate a citation string for all datasets contained in a census slice.
Contents
- Requirements
- Generating citation strings
- Via cell metadata query
- Via Seurat query
- Via SingleCellExperiment query
⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable is_primary_data
which is described in the Census schema.
Requirements
This notebook requires:
-
cellxgene_census
Python package. - Census data release with schema version 1.3.0 or greater.
Generating citation strings
First we open a handle to the Census data. To ensure we open a data release with schema version 1.3.0 or greater, we use census_version="latest"
library("tiledb")
library("cellxgene.census")
census <- open_soma(census_version = "latest")
census_release_info <- census$get("census_info")$get("summary")$read()$concat()
as.data.frame(census_release_info)
#> soma_joinid label value
#> 1 0 census_schema_version 2.1.0
#> 2 1 census_build_date 2024-07-15
#> 3 2 dataset_schema_version 5.1.0
#> 4 3 total_cell_count 117424316
#> 5 4 unique_cell_count 61873054
#> 6 5 number_donors_homo_sapiens 18042
#> 7 6 number_donors_mus_musculus 4256
Then we load the dataset table which contains a column "citation"
for each dataset included in Census.
datasets <- census$get("census_info")$get("datasets")$read()$concat()
datasets <- as.data.frame(datasets)
head(datasets["citation"])
#> citation
#> 1 Publication: https://doi.org/10.1002/hep4.1854 Dataset Version: https://datasets.cellxgene.cziscience.com/aaab3abd-624a-442e-b62b-3f2edb10b45e.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/44531dd9-1388-4416-a117-af0a99de2294
#> 2 Publication: https://doi.org/10.1126/sciimmunol.abe6291 Dataset Version: https://datasets.cellxgene.cziscience.com/50c1d621-995d-4386-9fcb-5c70fcdf8d66.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/3a2af25b-2338-4266-aad3-aa8d07473f50
#> 3 Publication: https://doi.org/10.1038/s41593-020-00764-7 Dataset Version: https://datasets.cellxgene.cziscience.com/e95b54b1-8656-4fe8-9f53-6fdd97f397ba.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/180bff9c-c8a5-4539-b13b-ddbc00d643e6
#> 4 Publication: https://doi.org/10.1038/s41467-022-29450-x Dataset Version: https://datasets.cellxgene.cziscience.com/d6e742c5-f6e5-42f4-8064-622783542f6b.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/bf325905-5e8e-42e3-933d-9a9053e9af80
#> 5 Publication: https://doi.org/10.1038/s41590-021-01059-0 Dataset Version: https://datasets.cellxgene.cziscience.com/61f15353-e598-43b5-bb5a-80ac44a0cf0b.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/93eebe82-d8c3-41bc-a906-63b5b5f24a9d
#> 6 Publication: https://doi.org/10.1016/j.celrep.2019.12.082 Dataset Version: https://datasets.cellxgene.cziscience.com/76b42c8c-9357-4c13-908f-8b757a0f8637.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/939769a8-d8d2-4d01-abfc-55699893fd49
And now we can use the column "dataset_id"
present in both the dataset table and the Census cell metadata to create citation strings for any Census slice.
Via cell metadata query
# Query cell metadata
cell_metadata <- census$get("census_data")$get("homo_sapiens")$obs$read(
value_filter = "tissue == 'cardiac atrium'",
column_names = c("dataset_id", "cell_type")
)
cell_metadata <- as.data.frame(cell_metadata$concat())
# Get a citation string for the slice
slice_datasets <- datasets[datasets$dataset_id %in% cell_metadata$dataset_id, ]
print(slice_datasets$citation)
#> [1] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/dbcbe0a6-918a-4440-9a56-6d03f0f22df5.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [2] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/07900e47-7ab4-48d4-a26e-abdd010f4bbf.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [3] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/cb872c2c-64a4-405f-96c3-03124405cc6c.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [4] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/3149e7d3-1ae4-4b59-a54b-73e9f591b699.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [5] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/50a18e6a-797b-40bd-aa07-6ed50a1f2cf6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [6] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/981bcf57-30cb-4a85-b905-e04373432fef.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
Via Seurat query
# Fetch a Seurat object
seurat_obj <- get_seurat(
census = census,
organism = "homo_sapiens",
measurement_name = "RNA",
obs_value_filter = "tissue == 'cardiac atrium'",
var_value_filter = "feature_name == 'MYBPC3'",
obs_column_names = c("dataset_id", "cell_type")
)
# Get a citation string for the slice
slice_datasets <- datasets[datasets$dataset_id %in% seurat_obj[[]]$dataset_id, ]
print(slice_datasets$citation)
#> [1] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/dbcbe0a6-918a-4440-9a56-6d03f0f22df5.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [2] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/07900e47-7ab4-48d4-a26e-abdd010f4bbf.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [3] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/cb872c2c-64a4-405f-96c3-03124405cc6c.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [4] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/3149e7d3-1ae4-4b59-a54b-73e9f591b699.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [5] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/50a18e6a-797b-40bd-aa07-6ed50a1f2cf6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [6] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/981bcf57-30cb-4a85-b905-e04373432fef.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
Via SingleCellExperiment query
# Fetch a Seurat object
sce_obj <- get_single_cell_experiment(
census = census,
organism = "homo_sapiens",
measurement_name = "RNA",
obs_value_filter = "tissue == 'cardiac atrium'",
var_value_filter = "feature_name == 'MYBPC3'",
obs_column_names = c("dataset_id", "cell_type")
)
# Get a citation string for the slice
slice_datasets <- datasets[datasets$dataset_id %in% sce_obj$dataset_id, ]
print(slice_datasets$citation)
#> [1] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/dbcbe0a6-918a-4440-9a56-6d03f0f22df5.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [2] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/07900e47-7ab4-48d4-a26e-abdd010f4bbf.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [3] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/cb872c2c-64a4-405f-96c3-03124405cc6c.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [4] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/3149e7d3-1ae4-4b59-a54b-73e9f591b699.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [5] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/50a18e6a-797b-40bd-aa07-6ed50a1f2cf6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
#> [6] "Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/981bcf57-30cb-4a85-b905-e04373432fef.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"