Generating citations for Census slices

This notebook demonstrates how to generate a citation string for all datasets contained in a Census slice.

Contents

  1. Requirements

  2. Generating citation strings

    1. Via cell metadata query

    2. Via an AnnData query

⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable is_primary_data which is described in the Census schema.

Requirements

This notebook requires:

  • cellxgene_census Python package.

  • Census data release with schema version 1.3.0 or greater.

Generating citation strings

First we open a handle to the Census data. To ensure we open a data release with schema version 1.3.0 or greater, we use census_version="latest"

[1]:
import cellxgene_census

census = cellxgene_census.open_soma(census_version="latest")
census["census_info"]["summary"].read().concat().to_pandas()
/opt/anaconda3/envs/census_latest/lib/python3.10/site-packages/tiledb/cloud/config.py:96: UserWarning: You must first login before you can run commands. Please run tiledb.cloud.login.
  warnings.warn(
[1]:
soma_joinid label value
0 0 census_schema_version 2.1.0
1 1 census_build_date 2024-06-21
2 2 dataset_schema_version 5.1.0
3 3 total_cell_count 117056090
4 4 unique_cell_count 61555904
5 5 number_donors_homo_sapiens 17978
6 6 number_donors_mus_musculus 4255

Then we load the dataset table which contains a column "citation" for each dataset included in Census.

[2]:
datasets = census["census_info"]["datasets"].read().concat().to_pandas()
datasets["citation"]
[2]:
0      Publication: https://doi.org/10.1002/hep4.1854...
1      Publication: https://doi.org/10.1126/sciimmuno...
2      Publication: https://doi.org/10.1038/s41593-02...
3      Publication: https://doi.org/10.1038/s41467-02...
4      Publication: https://doi.org/10.1038/s41590-02...
                             ...
826    Publication: https://doi.org/10.1038/s41586-02...
827    Publication: https://doi.org/10.1101/2023.05.0...
828    Publication: https://doi.org/10.1101/2023.05.0...
829    Publication: https://doi.org/10.1038/s41586-02...
830    Publication: https://doi.org/10.1038/s41586-02...
Name: citation, Length: 831, dtype: object

For cross-ref style citations you can look at the column "collection_doi_label"

[3]:
datasets["collection_doi_label"]
[3]:
0      Andrews et al. (2022) Hepatology Communications
1                     King et al. (2021) Sci. Immunol.
2                      Leng et al. (2021) Nat Neurosci
3            Rodríguez-Ubreva et al. (2022) Nat Commun
4                     Triana et al. (2021) Nat Immunol
                            ...
826                           Qiu et al. (2024) Nature
827                      Gabitto et al. (2023) bioRxiv
828                      Gabitto et al. (2023) bioRxiv
829                           Qiu et al. (2024) Nature
830                           Qiu et al. (2024) Nature
Name: collection_doi_label, Length: 831, dtype: object

And now we can use the column "dataset_id" present in both the dataset table and the Census cell metadata to create citation strings for any Census slice.

Via cell metadata query

[4]:
# Query cell metadata
cell_metadata = cellxgene_census.get_obs(
    census, "homo_sapiens", value_filter="tissue == 'cardiac atrium'", column_names=["dataset_id", "cell_type"]
)

# Get a citation string for the slice
slice_datasets = datasets[datasets["dataset_id"].isin(cell_metadata["dataset_id"])]
print(*set(slice_datasets["citation"]), sep="\n\n")
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/981bcf57-30cb-4a85-b905-e04373432fef.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/07900e47-7ab4-48d4-a26e-abdd010f4bbf.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/50a18e6a-797b-40bd-aa07-6ed50a1f2cf6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/cb872c2c-64a4-405f-96c3-03124405cc6c.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/3149e7d3-1ae4-4b59-a54b-73e9f591b699.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/dbcbe0a6-918a-4440-9a56-6d03f0f22df5.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
[5]:
print(*set(slice_datasets["collection_doi_label"]), sep="\n\n")
The Tabula Sapiens Consortium* et al. (2022) Science

Via AnnData query

[6]:
# Fetch an AnnData object
adata = cellxgene_census.get_anndata(
    census=census,
    organism="homo_sapiens",
    measurement_name="RNA",
    obs_value_filter="tissue == 'cardiac atrium'",
    var_value_filter="feature_name == 'MYBPC3'",
    obs_column_names=["dataset_id", "cell_type"],
)

# Get a citation string for the slice
slice_datasets = datasets[datasets["dataset_id"].isin(adata.obs["dataset_id"])]
print(*set(slice_datasets["citation"]), sep="\n\n")
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/981bcf57-30cb-4a85-b905-e04373432fef.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/07900e47-7ab4-48d4-a26e-abdd010f4bbf.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/50a18e6a-797b-40bd-aa07-6ed50a1f2cf6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/cb872c2c-64a4-405f-96c3-03124405cc6c.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/3149e7d3-1ae4-4b59-a54b-73e9f591b699.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/dbcbe0a6-918a-4440-9a56-6d03f0f22df5.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
[7]:
print(*set(slice_datasets["collection_doi_label"]), sep="\n\n")
The Tabula Sapiens Consortium* et al. (2022) Science

And don’t forget to close the Census handle

[8]:
census.close()