Genes measured in each cell (dataset presence matrix)
Source:vignettes/census_dataset_presence.Rmd
census_dataset_presence.Rmd
The Census is a compilation of cells from multiple datasets that may differ by the sets of genes they measure. This notebook describes the way to identify the genes measured per dataset.
The presence matrix is a sparse boolean array, indicating which features (var) were present in each dataset. The array has dimensions [n_datasets, n_var], and is stored in the SOMA Measurement varp
collection. The first dimension is indexed by the soma_joinid
in the census_datasets
dataframe. The second is indexed by the soma_joinid
in the var
dataframe of the measurement.
As a reminder the obs
data frame has a column dataset_id
that can be used to link any cell in the Census to the presence matrix.
Contents
- Opening the Census.
- Fetching the IDs of the Census datasets.
- Fetching the dataset presence matrix.
- Identifying genes measured in a specific dataset.
- Identifying datasets that measured specific genes.
- Identifying all genes measured in a dataset.
Opening the Census
The cellxgene.census
R package contains a convenient API to open any version of the Census (by default, the newest stable version).
library("cellxgene.census")
census <- open_soma()
Fetching the IDs of the Census datasets
Let’s grab a table of all the datasets included in the Census and use this table in combination with the presence matrix below.
# Grab the experiment containing human data, and the measurement therein with RNA
human <- census$get("census_data")$get("homo_sapiens")
human_rna <- human$ms$get("RNA")
# The census-wide datasets
datasets_df <- as.data.frame(census$get("census_info")$get("datasets")$read()$concat())
print(datasets_df)
#> soma_joinid
#> 1 0
#> 2 1
#> 3 2
#> 4 3
#> 5 4
#> 6 5
#> 7 6
#> 8 7
#> 9 8
#> 10 9
#> citation
#> 1 Publication: https://doi.org/10.1002/hep4.1854 Dataset Version: https://datasets.cellxgene.cziscience.com/fb76c95f-0391-4fac-9fb9-082ce2430b59.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/44531dd9-1388-4416-a117-af0a99de2294
#> 2 Publication: https://doi.org/10.1126/sciimmunol.abe6291 Dataset Version: https://datasets.cellxgene.cziscience.com/b6737a5e-9069-4dd6-9a57-92e17a746df9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/3a2af25b-2338-4266-aad3-aa8d07473f50
#> 3 Publication: https://doi.org/10.1038/s41593-020-00764-7 Dataset Version: https://datasets.cellxgene.cziscience.com/0e02290f-b992-450b-8a19-554f73cd7f09.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/180bff9c-c8a5-4539-b13b-ddbc00d643e6
#> 4 Publication: https://doi.org/10.1038/s41467-022-29450-x Dataset Version: https://datasets.cellxgene.cziscience.com/40832710-d7b1-43fb-b2c2-1cd2255bc3ac.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/bf325905-5e8e-42e3-933d-9a9053e9af80
#> 5 Publication: https://doi.org/10.1038/s41590-021-01059-0 Dataset Version: https://datasets.cellxgene.cziscience.com/eb6c070c-ff67-4c1f-8d4d-65f9fe2119ee.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/93eebe82-d8c3-41bc-a906-63b5b5f24a9d
#> 6 Publication: https://doi.org/10.1016/j.celrep.2019.12.082 Dataset Version: https://datasets.cellxgene.cziscience.com/650a47be-6666-4f70-ac47-8414c50bbd8e.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/939769a8-d8d2-4d01-abfc-55699893fd49
#> 7 Publication: https://doi.org/10.1038/s41593-020-00764-7 Dataset Version: https://datasets.cellxgene.cziscience.com/1f0cd8ed-94c6-440c-bd5b-bad55e2666b1.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/180bff9c-c8a5-4539-b13b-ddbc00d643e6
#> 8 Publication: https://doi.org/10.1016/j.cell.2022.11.005 Dataset Version: https://datasets.cellxgene.cziscience.com/086c8f4e-3fe7-46e5-8b6b-a8cb7f92dadd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/2d2e2acd-dade-489f-a2da-6c11aa654028
#> 9 Publication: https://doi.org/10.1016/j.jhep.2023.12.023 Dataset Version: https://datasets.cellxgene.cziscience.com/7e88c4aa-d95f-43e3-bc8b-cf02629d7301.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/0c8a364b-97b5-4cc8-a593-23c38c6f0ac5
#> 10 Publication: https://doi.org/10.1002/hep4.1854 Dataset Version: https://datasets.cellxgene.cziscience.com/575dd70a-45b1-4a82-8acd-467d313e4c66.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/44531dd9-1388-4416-a117-af0a99de2294
#> collection_id
#> 1 44531dd9-1388-4416-a117-af0a99de2294
#> 2 3a2af25b-2338-4266-aad3-aa8d07473f50
#> 3 180bff9c-c8a5-4539-b13b-ddbc00d643e6
#> 4 bf325905-5e8e-42e3-933d-9a9053e9af80
#> 5 93eebe82-d8c3-41bc-a906-63b5b5f24a9d
#> 6 939769a8-d8d2-4d01-abfc-55699893fd49
#> 7 180bff9c-c8a5-4539-b13b-ddbc00d643e6
#> 8 2d2e2acd-dade-489f-a2da-6c11aa654028
#> 9 0c8a364b-97b5-4cc8-a593-23c38c6f0ac5
#> 10 44531dd9-1388-4416-a117-af0a99de2294
#> collection_name
#> 1 Single-Cell, Single-Nucleus, and Spatial RNA Sequencing of the Human Liver Identifies Cholangiocyte and Mesenchymal Heterogeneity
#> 2 Single-cell analysis of human B cell maturation predicts how antibody class switching shapes selection dynamics
#> 3 Molecular characterization of selectively vulnerable neurons in Alzheimer's Disease
#> 4 Single-cell Atlas of common variable immunodeficiency shows germinal center-associated epigenetic dysregulation in B-cell responses
#> 5 Single-cell proteo-genomic reference maps of the hematopoietic system enable the purification and massive profiling of precisely defined cell states
#> 6 Integration of eQTL and a Single-Cell Atlas in the Human Eye Identifies Causal Genes for Age-Related Macular Degeneration
#> 7 Molecular characterization of selectively vulnerable neurons in Alzheimer's Disease
#> 8 A human fetal lung cell atlas uncovers proximal-distal gradients of differentiation and key regulators of epithelial fates
#> 9 Single-cell and spatial transcriptomics characterisation of the immunological landscape in the healthy and PSC human liver
#> 10 Single-Cell, Single-Nucleus, and Spatial RNA Sequencing of the Human Liver Identifies Cholangiocyte and Mesenchymal Heterogeneity
#> collection_doi dataset_id
#> 1 10.1002/hep4.1854 0895c838-e550-48a3-a777-dbcd35d30272
#> 2 10.1126/sciimmunol.abe6291 00ff600e-6e2e-4d76-846f-0eec4f0ae417
#> 3 10.1038/s41593-020-00764-7 bdacc907-7c26-419f-8808-969eab3ca2e8
#> 4 10.1038/s41467-022-29450-x a5d95a42-0137-496f-8a60-101e17f263c8
#> 5 10.1038/s41590-021-01059-0 d3566d6a-a455-4a15-980f-45eb29114cab
#> 6 10.1016/j.celrep.2019.12.082 de17ac25-550a-4018-be75-bbb485a0636e
#> 7 10.1038/s41593-020-00764-7 9f1049ac-f8b7-45ad-8e31-6e96c3e5058f
#> 8 10.1016/j.cell.2022.11.005 703f00e6-b996-48e5-bc34-00c41b9876f4
#> 9 10.1016/j.jhep.2023.12.023 e347396c-a7ff-4691-9f7a-99a43555ca18
#> 10 10.1002/hep4.1854 524e045e-e74c-4e00-9884-d5c3bef3d862
#> dataset_version_id
#> 1 fb76c95f-0391-4fac-9fb9-082ce2430b59
#> 2 b6737a5e-9069-4dd6-9a57-92e17a746df9
#> 3 0e02290f-b992-450b-8a19-554f73cd7f09
#> 4 40832710-d7b1-43fb-b2c2-1cd2255bc3ac
#> 5 eb6c070c-ff67-4c1f-8d4d-65f9fe2119ee
#> 6 650a47be-6666-4f70-ac47-8414c50bbd8e
#> 7 1f0cd8ed-94c6-440c-bd5b-bad55e2666b1
#> 8 086c8f4e-3fe7-46e5-8b6b-a8cb7f92dadd
#> 9 7e88c4aa-d95f-43e3-bc8b-cf02629d7301
#> 10 575dd70a-45b1-4a82-8acd-467d313e4c66
#> dataset_title
#> 1 Healthy human liver: B cells
#> 2 Human tonsil nonlymphoid cells scRNA
#> 3 Molecular characterization of selectively vulnerable neurons in Alzheimer<U+2019>s Disease: SFG microglia
#> 4 Steady-state B cells - scRNA-seq
#> 5 blood and bone marrow from a healthy young donor
#> 6 Myeloid cells of human eye
#> 7 Molecular characterization of selectively vulnerable neurons in Alzheimer<U+2019>s Disease: EC microglia
#> 8 PNS
#> 9 Stellate cells from human healthy donor liver samples
#> 10 Healthy human liver: hepatic stellate cells
#> dataset_h5ad_path dataset_total_cell_count
#> 1 0895c838-e550-48a3-a777-dbcd35d30272.h5ad 146
#> 2 00ff600e-6e2e-4d76-846f-0eec4f0ae417.h5ad 363
#> 3 bdacc907-7c26-419f-8808-969eab3ca2e8.h5ad 3799
#> 4 a5d95a42-0137-496f-8a60-101e17f263c8.h5ad 1324
#> 5 d3566d6a-a455-4a15-980f-45eb29114cab.h5ad 15502
#> 6 de17ac25-550a-4018-be75-bbb485a0636e.h5ad 395
#> 7 9f1049ac-f8b7-45ad-8e31-6e96c3e5058f.h5ad 5572
#> 8 703f00e6-b996-48e5-bc34-00c41b9876f4.h5ad 649
#> 9 e347396c-a7ff-4691-9f7a-99a43555ca18.h5ad 1417
#> 10 524e045e-e74c-4e00-9884-d5c3bef3d862.h5ad 1374
#> [ reached 'max' / getOption("max.print") -- omitted 802 rows ]
Fetching the dataset presence matrix
Now let’s fetch the dataset presence matrix.
For convenience, read the entire presence matrix (for Homo sapiens) into a sparse matrix. There is a convenience function providing this capability:
presence_matrix <- get_presence_matrix(census, "Homo sapiens", "RNA")
print(dim(presence_matrix))
#> NULL
We also need the var
dataframe, which is read into an R data frame for convenient manipulation:
var_df <- as.data.frame(human_rna$var$read()$concat())
print(var_df)
#> soma_joinid feature_id feature_name feature_length nnz n_measured_obs
#> 1 0 ENSG00000000003 TSPAN6 4530 4530448 73855064
#> 2 1 ENSG00000000005 TNMD 1476 236059 61201828
#> 3 2 ENSG00000000419 DPM1 9276 17576462 74159149
#> 4 3 ENSG00000000457 SCYL3 6883 9117322 73988868
#> 5 4 ENSG00000000460 C1orf112 5970 6287794 73636201
#> 6 5 ENSG00000000938 FGR 3382 5667858 74061500
#> 7 6 ENSG00000000971 CFH 15284 4423029 74095467
#> 8 7 ENSG00000001036 FUCA2 2822 10129388 73988868
#> 9 8 ENSG00000001084 GCLC 8618 12662710 74226365
#> 10 9 ENSG00000001167 NFYA 6209 4866873 73988868
#> 11 10 ENSG00000001460 STPG1 8511 6138613 74132995
#> 12 11 ENSG00000001461 NIPAL3 9396 13079400 74197924
#> 13 12 ENSG00000001497 LAS1L 12555 10716777 73858963
#> 14 13 ENSG00000001561 ENPP4 4644 9509567 74091113
#> 15 14 ENSG00000001617 SEMA3F 4826 2055855 72941475
#> 16 15 ENSG00000001626 CFTR 16821 5419122 71475983
#> [ reached 'max' / getOption("max.print") -- omitted 60514 rows ]
Identifying genes measured in a specific dataset
Now that we have the dataset table, the genes metadata table, and the dataset presence matrix, we can check if a gene or set of genes were measured in a specific dataset.
Important: the presence matrix is indexed by soma_joinid
, and is NOT positionally indexed. In other words:
- the first dimension of the presence matrix is the dataset’s
soma_joinid
, as stored in thecensus_datasets
dataframe. - the second dimension of the presence matrix is the feature’s
soma_joinid
, as stored in thevar
dataframe.
The presence matrix has a method $take()
that lets you slice it by soma_joinid
s from census_datasets
and var
. And the full presence matrix, or slices of it, can then be exported to a regular matrix with the method $get_one_based_matrix()
Let’s find out if the the gene "ENSG00000286096"
was measured in the dataset with id "97a17473-e2b1-4f31-a544-44a60773e2dd"
.
# Get soma_joinid for datasets and genes of interest
var_joinid <- var_df$soma_joinid[var_df$feature_id == "ENSG00000286096"]
dataset_joinid <- datasets_df$soma_joinid[datasets_df$dataset_id == "97a17473-e2b1-4f31-a544-44a60773e2dd"]
# Slice presence matrix with datasets and genes of interest
presence_matrix_slice <- presence_matrix$take(i = dataset_joinid, j = var_joinid)
# Convert presence matrix to regular matrix
presence_matrix_slice <- presence_matrix_slice$get_one_based_matrix()
# Find how if the gene is present in this dataset
is_present <- presence_matrix_slice[, , drop = TRUE]
cat(paste("Feature is", if (is_present) "present." else "not present."))
#> Feature is present.
Identifying datasets that measured specific genes
Similarly, we can determine the datasets that measured a specific gene or set of genes.
# Grab the feature's soma_joinid from the var dataframe
var_joinid <- var_df$soma_joinid[var_df$feature_id == "ENSG00000286096"]
# The presence matrix is indexed by the joinids of the dataset and var dataframes,
# so slice out the feature of interest by its joinid.
presence_matrix_slice <- presence_matrix$take(j = var_joinid)$get_one_based_matrix()
measured_datasets <- presence_matrix_slice[, , drop = TRUE] != 0
dataset_joinids <- datasets_df$soma_joinid[measured_datasets]
# From the datasets dataframe, slice out the datasets which have a joinid in the list
print(datasets_df[dataset_joinids, ])
#> soma_joinid
#> 264 263
#> 272 271
#> 299 298
#> 302 301
#> 316 315
#> 335 334
#> 350 349
#> 352 351
#> 353 352
#> 367 366
#> citation
#> 264 Publication: https://doi.org/10.1038/s41586-020-2496-1 Dataset Version: https://datasets.cellxgene.cziscience.com/a87a3135-3ce7-405e-bc07-b7eac2c09cc6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/0b9d8a04-bb9d-44da-aa27-705bb65b54eb
#> 272 Publication: https://doi.org/10.1016/j.cell.2021.12.018 Dataset Version: https://datasets.cellxgene.cziscience.com/80650a81-3bde-49cf-873b-548425184f7a.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/74e10dc4-cbb2-4605-a189-8a1cd8e44d8c
#> 299 Publication: https://doi.org/10.1101/2023.05.08.539485 Dataset Version: https://datasets.cellxgene.cziscience.com/5097433f-5cf0-4a5e-b252-b9f012e76d46.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/1ca90a2d-2943-483d-b678-b809bf464c30
#> 302 Publication: https://doi.org/10.1101/2023.05.08.539485 Dataset Version: https://datasets.cellxgene.cziscience.com/d8df87f4-279a-4e85-a192-b3fe0df3143e.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/1ca90a2d-2943-483d-b678-b809bf464c30
#> 316 Publication: https://doi.org/10.1016/j.cell.2020.08.013 Dataset Version: https://datasets.cellxgene.cziscience.com/74a8e0fe-08ee-4a93-9c2f-9cb70acd714b.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/2f4c738f-e2f3-4553-9db2-0582a38ea4dc
#> 335 Publication: https://doi.org/10.1016/j.cell.2021.07.023 Dataset Version: https://datasets.cellxgene.cziscience.com/837efefb-05c5-4dfc-9e02-17842c1880f2.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/35d0b748-3eed-43a5-a1c4-1dade5ec5ca0
#> 350 Publication: https://doi.org/10.1038/s41586-020-2797-4 Dataset Version: https://datasets.cellxgene.cziscience.com/0f5dba64-8621-420f-a404-2f6836cbe1ce.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/b52eb423-5d0d-4645-b217-e1c6d38b2e72
#> 352 Publication: https://doi.org/10.1038/s41586-020-2496-1 Dataset Version: https://datasets.cellxgene.cziscience.com/1bd1f7fb-5efe-4886-abc5-aa08bf218a83.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/0b9d8a04-bb9d-44da-aa27-705bb65b54eb
#> 353 Publication: https://doi.org/10.1126/science.add7046 Dataset Version: https://datasets.cellxgene.cziscience.com/fb6148b1-20ab-4e0e-bb6d-e5d8bb2147f2.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/283d65eb-dd53-496d-adb7-7570c7caa443
#> 367 Publication: https://doi.org/10.1126/science.add7046 Dataset Version: https://datasets.cellxgene.cziscience.com/628158bf-037c-4198-8e4e-b97e2675506b.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/283d65eb-dd53-496d-adb7-7570c7caa443
#> collection_id
#> 264 0b9d8a04-bb9d-44da-aa27-705bb65b54eb
#> 272 74e10dc4-cbb2-4605-a189-8a1cd8e44d8c
#> 299 1ca90a2d-2943-483d-b678-b809bf464c30
#> 302 1ca90a2d-2943-483d-b678-b809bf464c30
#> 316 2f4c738f-e2f3-4553-9db2-0582a38ea4dc
#> 335 35d0b748-3eed-43a5-a1c4-1dade5ec5ca0
#> 350 b52eb423-5d0d-4645-b217-e1c6d38b2e72
#> 352 0b9d8a04-bb9d-44da-aa27-705bb65b54eb
#> 353 283d65eb-dd53-496d-adb7-7570c7caa443
#> 367 283d65eb-dd53-496d-adb7-7570c7caa443
#> collection_name
#> 264 Tabula Muris Senis
#> 272 Spatial proteogenomics reveals distinct and evolutionarily conserved hepatic macrophage niches
#> 299 SEA-AD: Seattle Alzheimer<U+2019>s Disease Brain Cell Atlas
#> 302 SEA-AD: Seattle Alzheimer<U+2019>s Disease Brain Cell Atlas
#> 316 Cell Types of the Human Retina and Its Organoids at Single-Cell Resolution
#> 335 Impaired local intrinsic immunity to SARS-CoV-2 infection in severe COVID-19
#> 350 Cells of the adult human heart
#> 352 Tabula Muris Senis
#> 353 Human Brain Cell Atlas v1.0
#> 367 Human Brain Cell Atlas v1.0
#> collection_doi dataset_id
#> 264 10.1038/s41586-020-2496-1 93966790-bbfa-420f-aa85-bc5ca51d9c96
#> 272 10.1016/j.cell.2021.12.018 d1cbed97-d88f-4954-8925-13302fe30b39
#> 299 10.1101/2023.05.08.539485 f67f2cfa-ba45-4f77-8e26-64a15f666043
#> 302 10.1101/2023.05.08.539485 8c6ed88f-11bf-4159-bad7-ff41fb1e1eca
#> 316 10.1016/j.cell.2020.08.013 babbf9f3-f482-45a5-ba76-c41f4c2f503e
#> 335 10.1016/j.cell.2021.07.023 13b61a7d-5605-4948-ba48-02c588960143
#> 350 10.1038/s41586-020-2797-4 9d584fcb-a28a-4b91-a886-ceb66a88ef81
#> 352 10.1038/s41586-020-2496-1 0fb7916e-7a68-4a4c-a441-3ab3989f29a7
#> 353 10.1126/science.add7046 93131426-0124-4ab4-a013-9dfbcd99d467
#> 367 10.1126/science.add7046 7d3ab174-e433-40fc-a352-6fe71b1a19f9
#> dataset_version_id
#> 264 a87a3135-3ce7-405e-bc07-b7eac2c09cc6
#> 272 80650a81-3bde-49cf-873b-548425184f7a
#> 299 5097433f-5cf0-4a5e-b252-b9f012e76d46
#> 302 d8df87f4-279a-4e85-a192-b3fe0df3143e
#> 316 74a8e0fe-08ee-4a93-9c2f-9cb70acd714b
#> 335 837efefb-05c5-4dfc-9e02-17842c1880f2
#> 350 0f5dba64-8621-420f-a404-2f6836cbe1ce
#> 352 1bd1f7fb-5efe-4886-abc5-aa08bf218a83
#> 353 fb6148b1-20ab-4e0e-bb6d-e5d8bb2147f2
#> 367 628158bf-037c-4198-8e4e-b97e2675506b
#> dataset_title
#> 264 Bladder lumen - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - 10x
#> 272 CD45 negative cells from human liver dataset
#> 299 Pax6 - MTG: Seattle Alzheimer's Disease Atlas (SEA-AD)
#> 302 Chandelier - MTG: Seattle Alzheimer's Disease Atlas (SEA-AD)
#> 316 Fovea - Cell Types of the Human Retina and Its Organoids at Single-Cell Resolution
#> 335 Nasopharynx
#> 350 Fibroblasts <U+2014> Cells of the adult human heart
#> 352 Lung - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - 10x
#> 353 Dissection: Epithalamus - ETH
#> 367 Dissection: Amygdaloid complex (AMY) - Basolateral nuclear group (BLN) - lateral nucleus - La
#> dataset_h5ad_path dataset_total_cell_count
#> 264 93966790-bbfa-420f-aa85-bc5ca51d9c96.h5ad 8945
#> 272 d1cbed97-d88f-4954-8925-13302fe30b39.h5ad 15481
#> 299 f67f2cfa-ba45-4f77-8e26-64a15f666043.h5ad 8463
#> 302 8c6ed88f-11bf-4159-bad7-ff41fb1e1eca.h5ad 9893
#> 316 babbf9f3-f482-45a5-ba76-c41f4c2f503e.h5ad 19768
#> 335 13b61a7d-5605-4948-ba48-02c588960143.h5ad 32588
#> 350 9d584fcb-a28a-4b91-a886-ceb66a88ef81.h5ad 59341
#> 352 0fb7916e-7a68-4a4c-a441-3ab3989f29a7.h5ad 24540
#> 353 93131426-0124-4ab4-a013-9dfbcd99d467.h5ad 24327
#> 367 7d3ab174-e433-40fc-a352-6fe71b1a19f9.h5ad 28984
#> [ reached 'max' / getOption("max.print") -- omitted 32 rows ]
Identifying all genes measured in a dataset
Finally, we can find the set of genes that were measured in the cells of a given dataset.
# Slice the dataset(s) of interest, and get the joinid(s)
dataset_joinids <- datasets_df$soma_joinid[datasets_df$collection_id == "17481d16-ee44-49e5-bcf0-28c0780d8c4a"]
# Slice the presence matrix by the first dimension, i.e., by dataset
presence_matrix_slice <- presence_matrix$take(i = dataset_joinids)$get_one_based_matrix()
genes_measured <- Matrix::colSums(presence_matrix_slice) > 0
var_joinids <- var_df$soma_joinid[genes_measured]
print(var_df[var_joinids, ])
#> soma_joinid feature_id feature_name feature_length nnz n_measured_obs
#> 1 0 ENSG00000000003 TSPAN6 4530 4530448 73855064
#> 2 1 ENSG00000000005 TNMD 1476 236059 61201828
#> 3 2 ENSG00000000419 DPM1 9276 17576462 74159149
#> 4 3 ENSG00000000457 SCYL3 6883 9117322 73988868
#> 5 4 ENSG00000000460 C1orf112 5970 6287794 73636201
#> 6 5 ENSG00000000938 FGR 3382 5667858 74061500
#> 7 6 ENSG00000000971 CFH 15284 4423029 74095467
#> 8 7 ENSG00000001036 FUCA2 2822 10129388 73988868
#> 9 8 ENSG00000001084 GCLC 8618 12662710 74226365
#> 10 9 ENSG00000001167 NFYA 6209 4866873 73988868
#> 11 10 ENSG00000001460 STPG1 8511 6138613 74132995
#> 12 11 ENSG00000001461 NIPAL3 9396 13079400 74197924
#> 13 12 ENSG00000001497 LAS1L 12555 10716777 73858963
#> 14 13 ENSG00000001561 ENPP4 4644 9509567 74091113
#> 15 14 ENSG00000001617 SEMA3F 4826 2055855 72941475
#> 16 15 ENSG00000001626 CFTR 16821 5419122 71475983
#> [ reached 'max' / getOption("max.print") -- omitted 27130 rows ]