Generating citations for Census slices
This notebook demonstrates how to generate a citation string for all datasets contained in a Census slice.
Contents
Requirements
Generating citation strings
Via cell metadata query
Via an AnnData query
⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable is_primary_data
which is described in the Census schema.
Requirements
This notebook requires:
cellxgene_census
Python package.Census data release with schema version 1.3.0 or greater.
Generating citation strings
First we open a handle to the Census data. To ensure we open a data release with schema version 1.3.0 or greater, we use census_version="latest"
[1]:
import cellxgene_census
census = cellxgene_census.open_soma(census_version="latest")
census["census_info"]["summary"].read().concat().to_pandas()
/opt/anaconda3/envs/census_latest/lib/python3.10/site-packages/tiledb/cloud/config.py:96: UserWarning: You must first login before you can run commands. Please run tiledb.cloud.login.
warnings.warn(
[1]:
soma_joinid | label | value | |
---|---|---|---|
0 | 0 | census_schema_version | 2.1.0 |
1 | 1 | census_build_date | 2024-06-21 |
2 | 2 | dataset_schema_version | 5.1.0 |
3 | 3 | total_cell_count | 117056090 |
4 | 4 | unique_cell_count | 61555904 |
5 | 5 | number_donors_homo_sapiens | 17978 |
6 | 6 | number_donors_mus_musculus | 4255 |
Then we load the dataset table which contains a column "citation"
for each dataset included in Census.
[2]:
datasets = census["census_info"]["datasets"].read().concat().to_pandas()
datasets["citation"]
[2]:
0 Publication: https://doi.org/10.1002/hep4.1854...
1 Publication: https://doi.org/10.1126/sciimmuno...
2 Publication: https://doi.org/10.1038/s41593-02...
3 Publication: https://doi.org/10.1038/s41467-02...
4 Publication: https://doi.org/10.1038/s41590-02...
...
826 Publication: https://doi.org/10.1038/s41586-02...
827 Publication: https://doi.org/10.1101/2023.05.0...
828 Publication: https://doi.org/10.1101/2023.05.0...
829 Publication: https://doi.org/10.1038/s41586-02...
830 Publication: https://doi.org/10.1038/s41586-02...
Name: citation, Length: 831, dtype: object
For cross-ref style citations you can look at the column "collection_doi_label"
[3]:
datasets["collection_doi_label"]
[3]:
0 Andrews et al. (2022) Hepatology Communications
1 King et al. (2021) Sci. Immunol.
2 Leng et al. (2021) Nat Neurosci
3 Rodríguez-Ubreva et al. (2022) Nat Commun
4 Triana et al. (2021) Nat Immunol
...
826 Qiu et al. (2024) Nature
827 Gabitto et al. (2023) bioRxiv
828 Gabitto et al. (2023) bioRxiv
829 Qiu et al. (2024) Nature
830 Qiu et al. (2024) Nature
Name: collection_doi_label, Length: 831, dtype: object
And now we can use the column "dataset_id"
present in both the dataset table and the Census cell metadata to create citation strings for any Census slice.
Via cell metadata query
[4]:
# Query cell metadata
cell_metadata = cellxgene_census.get_obs(
census, "homo_sapiens", value_filter="tissue == 'cardiac atrium'", column_names=["dataset_id", "cell_type"]
)
# Get a citation string for the slice
slice_datasets = datasets[datasets["dataset_id"].isin(cell_metadata["dataset_id"])]
print(*set(slice_datasets["citation"]), sep="\n\n")
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/981bcf57-30cb-4a85-b905-e04373432fef.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/07900e47-7ab4-48d4-a26e-abdd010f4bbf.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/50a18e6a-797b-40bd-aa07-6ed50a1f2cf6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/cb872c2c-64a4-405f-96c3-03124405cc6c.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/3149e7d3-1ae4-4b59-a54b-73e9f591b699.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/dbcbe0a6-918a-4440-9a56-6d03f0f22df5.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
[5]:
print(*set(slice_datasets["collection_doi_label"]), sep="\n\n")
The Tabula Sapiens Consortium* et al. (2022) Science
Via AnnData query
[6]:
# Fetch an AnnData object
adata = cellxgene_census.get_anndata(
census=census,
organism="homo_sapiens",
measurement_name="RNA",
obs_value_filter="tissue == 'cardiac atrium'",
var_value_filter="feature_name == 'MYBPC3'",
obs_column_names=["dataset_id", "cell_type"],
)
# Get a citation string for the slice
slice_datasets = datasets[datasets["dataset_id"].isin(adata.obs["dataset_id"])]
print(*set(slice_datasets["citation"]), sep="\n\n")
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/981bcf57-30cb-4a85-b905-e04373432fef.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/07900e47-7ab4-48d4-a26e-abdd010f4bbf.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/50a18e6a-797b-40bd-aa07-6ed50a1f2cf6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/cb872c2c-64a4-405f-96c3-03124405cc6c.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/3149e7d3-1ae4-4b59-a54b-73e9f591b699.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/dbcbe0a6-918a-4440-9a56-6d03f0f22df5.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
[7]:
print(*set(slice_datasets["collection_doi_label"]), sep="\n\n")
The Tabula Sapiens Consortium* et al. (2022) Science
And don’t forget to close the Census handle
[8]:
census.close()