Generating citations for Census slices¶
This notebook demonstrates how to generate a citation string for all datasets contained in a Census slice.
Contents
Requirements
Generating citation strings
Via cell metadata query
Via an AnnData query
⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable is_primary_data
which is described in the Census schema.
Requirements¶
This notebook requires:
cellxgene_census
Python package.Census data release with schema version 1.3.0 or greater.
Generating citation strings¶
First we open a handle to the Census data. To ensure we open a data release with schema version 1.3.0 or greater, we use census_version="latest"
[1]:
import cellxgene_census
census = cellxgene_census.open_soma(census_version="latest")
census["census_info"]["summary"].read().concat().to_pandas()
[1]:
soma_joinid | label | value | |
---|---|---|---|
0 | 0 | census_schema_version | 1.3.0 |
1 | 1 | census_build_date | 2024-01-01 |
2 | 2 | dataset_schema_version | 4.0.0 |
3 | 3 | total_cell_count | 75694072 |
4 | 4 | unique_cell_count | 45846761 |
5 | 5 | number_donors_homo_sapiens | 16292 |
6 | 6 | number_donors_mus_musculus | 2153 |
Then we load the dataset table which contains a column "citation"
for each dataset included in Census.
[2]:
datasets = census["census_info"]["datasets"].read().concat().to_pandas()
datasets["citation"]
[2]:
0 Dataset Version: https://datasets.cellxgene.cz...
1 Dataset Version: https://datasets.cellxgene.cz...
2 Dataset Version: https://datasets.cellxgene.cz...
3 Dataset Version: https://datasets.cellxgene.cz...
4 Publication: https://doi.org/10.1002/ctm2.1356...
...
695 Publication: https://doi.org/10.1038/s41586-02...
696 Publication: https://doi.org/10.1038/s41586-02...
697 Publication: https://doi.org/10.1016/j.isci.20...
698 Publication: https://doi.org/10.1371/journal.p...
699 Publication: https://doi.org/10.1016/j.isci.20...
Name: citation, Length: 700, dtype: object
And now we can use the column "dataset_id"
present in both the dataset table and the Census cell metadata to create citation strings for any Census slice.
Via cell metadata query¶
[3]:
# Query cell metadata
cell_metadata = census["census_data"]["homo_sapiens"].obs.read(
value_filter="tissue == 'cardiac atrium'", column_names=["dataset_id", "cell_type"]
)
cell_metadata = cell_metadata.concat().to_pandas()
# Get a citation string for the slice
slice_datasets = datasets[datasets["dataset_id"].isin(cell_metadata["dataset_id"])]
print(*slice_datasets["citation"], sep="\n\n")
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/4866a804-37eb-436f-8c87-9cd585260061.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/bfd80f12-725c-4482-ad7f-1ed2b4909b0d.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/e6df8a57-f54f-413a-9d4d-dee03294d778.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8d599205-5c51-4b50-9d48-3dec31238587.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/f6065c51-bd26-4aa5-a05d-2805aeea48d9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8cdbf790-4d29-4f46-9aef-21adfb2e21da.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Via AnnData query¶
[4]:
# Fetch an AnnData object
adata = cellxgene_census.get_anndata(
census=census,
organism="homo_sapiens",
measurement_name="RNA",
obs_value_filter="tissue == 'cardiac atrium'",
var_value_filter="feature_name == 'MYBPC3'",
column_names={"obs": ["dataset_id", "cell_type"]},
)
# Get a citation string for the slice
slice_datasets = datasets[datasets["dataset_id"].isin(adata.obs["dataset_id"])]
print(*slice_datasets["citation"], sep="\n\n")
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/4866a804-37eb-436f-8c87-9cd585260061.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/bfd80f12-725c-4482-ad7f-1ed2b4909b0d.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/e6df8a57-f54f-413a-9d4d-dee03294d778.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8d599205-5c51-4b50-9d48-3dec31238587.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/f6065c51-bd26-4aa5-a05d-2805aeea48d9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8cdbf790-4d29-4f46-9aef-21adfb2e21da.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
And don’t forget to close the Census handle
[6]:
census.close()