Genes measured in each cell (dataset presence matrix) • cellxgene.census

The Census is a compilation of cells from multiple datasets that may differ by the sets of genes they measure. This notebook describes the way to identify the genes measured per dataset.

The presence matrix is a sparse boolean array, indicating which features (var) were present in each dataset. The array has dimensions [n_datasets, n_var], and is stored in the SOMA Measurement varp collection. The first dimension is indexed by the soma_joinid in the census_datasets dataframe. The second is indexed by the soma_joinid in the var dataframe of the measurement.

As a reminder the obs data frame has a column dataset_id that can be used to link any cell in the Census to the presence matrix.

Contents

Opening the Census.
Fetching the IDs of the Census datasets.
Fetching the dataset presence matrix.
Identifying genes measured in a specific dataset.
Identifying datasets that measured specific genes.
Identifying all genes measured in a dataset.

Opening the Census

The cellxgene.census R package contains a convenient API to open any version of the Census (by default, the newest stable version).

library("cellxgene.census")
census <- open_soma()

Fetching the IDs of the Census datasets

Let’s grab a table of all the datasets included in the Census and use this table in combination with the presence matrix below.

# Grab the experiment containing human data, and the measurement therein with RNA
human <- census$get("census_data")$get("homo_sapiens")
human_rna <- human$ms$get("RNA")

# The census-wide datasets
datasets_df <- as.data.frame(census$get("census_info")$get("datasets")$read()$concat())
print(datasets_df)
#>    soma_joinid
#> 1            0
#> 2            1
#> 3            2
#> 4            3
#> 5            4
#> 6            5
#> 7            6
#> 8            7
#> 9            8
#> 10           9
#>                                                                                                                                                                                                                                                                                                            citation
#> 1             Publication: https://doi.org/10.1002/hep4.1854 Dataset Version: https://datasets.cellxgene.cziscience.com/fb76c95f-0391-4fac-9fb9-082ce2430b59.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/44531dd9-1388-4416-a117-af0a99de2294
#> 2    Publication: https://doi.org/10.1126/sciimmunol.abe6291 Dataset Version: https://datasets.cellxgene.cziscience.com/b6737a5e-9069-4dd6-9a57-92e17a746df9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/3a2af25b-2338-4266-aad3-aa8d07473f50
#> 3    Publication: https://doi.org/10.1038/s41593-020-00764-7 Dataset Version: https://datasets.cellxgene.cziscience.com/0e02290f-b992-450b-8a19-554f73cd7f09.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/180bff9c-c8a5-4539-b13b-ddbc00d643e6
#> 4    Publication: https://doi.org/10.1038/s41467-022-29450-x Dataset Version: https://datasets.cellxgene.cziscience.com/40832710-d7b1-43fb-b2c2-1cd2255bc3ac.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/bf325905-5e8e-42e3-933d-9a9053e9af80
#> 5    Publication: https://doi.org/10.1038/s41590-021-01059-0 Dataset Version: https://datasets.cellxgene.cziscience.com/eb6c070c-ff67-4c1f-8d4d-65f9fe2119ee.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/93eebe82-d8c3-41bc-a906-63b5b5f24a9d
#> 6  Publication: https://doi.org/10.1016/j.celrep.2019.12.082 Dataset Version: https://datasets.cellxgene.cziscience.com/650a47be-6666-4f70-ac47-8414c50bbd8e.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/939769a8-d8d2-4d01-abfc-55699893fd49
#> 7    Publication: https://doi.org/10.1038/s41593-020-00764-7 Dataset Version: https://datasets.cellxgene.cziscience.com/1f0cd8ed-94c6-440c-bd5b-bad55e2666b1.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/180bff9c-c8a5-4539-b13b-ddbc00d643e6
#> 8    Publication: https://doi.org/10.1016/j.cell.2022.11.005 Dataset Version: https://datasets.cellxgene.cziscience.com/086c8f4e-3fe7-46e5-8b6b-a8cb7f92dadd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/2d2e2acd-dade-489f-a2da-6c11aa654028
#> 9    Publication: https://doi.org/10.1016/j.jhep.2023.12.023 Dataset Version: https://datasets.cellxgene.cziscience.com/7e88c4aa-d95f-43e3-bc8b-cf02629d7301.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/0c8a364b-97b5-4cc8-a593-23c38c6f0ac5
#> 10            Publication: https://doi.org/10.1002/hep4.1854 Dataset Version: https://datasets.cellxgene.cziscience.com/575dd70a-45b1-4a82-8acd-467d313e4c66.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/44531dd9-1388-4416-a117-af0a99de2294
#>                           collection_id
#> 1  44531dd9-1388-4416-a117-af0a99de2294
#> 2  3a2af25b-2338-4266-aad3-aa8d07473f50
#> 3  180bff9c-c8a5-4539-b13b-ddbc00d643e6
#> 4  bf325905-5e8e-42e3-933d-9a9053e9af80
#> 5  93eebe82-d8c3-41bc-a906-63b5b5f24a9d
#> 6  939769a8-d8d2-4d01-abfc-55699893fd49
#> 7  180bff9c-c8a5-4539-b13b-ddbc00d643e6
#> 8  2d2e2acd-dade-489f-a2da-6c11aa654028
#> 9  0c8a364b-97b5-4cc8-a593-23c38c6f0ac5
#> 10 44531dd9-1388-4416-a117-af0a99de2294
#>                                                                                                                                         collection_name
#> 1                     Single-Cell, Single-Nucleus, and Spatial RNA Sequencing of the Human Liver Identifies Cholangiocyte and Mesenchymal Heterogeneity
#> 2                                       Single-cell analysis of human B cell maturation predicts how antibody class switching shapes selection dynamics
#> 3                                                                   Molecular characterization of selectively vulnerable neurons in Alzheimer's Disease
#> 4                   Single-cell Atlas of common variable immunodeficiency shows germinal center-associated epigenetic dysregulation in B-cell responses
#> 5  Single-cell proteo-genomic reference maps of the hematopoietic system enable the purification and massive profiling of precisely defined cell states
#> 6                             Integration of eQTL and a Single-Cell Atlas in the Human Eye Identifies Causal Genes for Age-Related Macular Degeneration
#> 7                                                                   Molecular characterization of selectively vulnerable neurons in Alzheimer's Disease
#> 8                            A human fetal lung cell atlas uncovers proximal-distal gradients of differentiation and key regulators of epithelial fates
#> 9                            Single-cell and spatial transcriptomics characterisation of the immunological landscape in the healthy and PSC human liver
#> 10                    Single-Cell, Single-Nucleus, and Spatial RNA Sequencing of the Human Liver Identifies Cholangiocyte and Mesenchymal Heterogeneity
#>                  collection_doi                           dataset_id
#> 1             10.1002/hep4.1854 0895c838-e550-48a3-a777-dbcd35d30272
#> 2    10.1126/sciimmunol.abe6291 00ff600e-6e2e-4d76-846f-0eec4f0ae417
#> 3    10.1038/s41593-020-00764-7 bdacc907-7c26-419f-8808-969eab3ca2e8
#> 4    10.1038/s41467-022-29450-x a5d95a42-0137-496f-8a60-101e17f263c8
#> 5    10.1038/s41590-021-01059-0 d3566d6a-a455-4a15-980f-45eb29114cab
#> 6  10.1016/j.celrep.2019.12.082 de17ac25-550a-4018-be75-bbb485a0636e
#> 7    10.1038/s41593-020-00764-7 9f1049ac-f8b7-45ad-8e31-6e96c3e5058f
#> 8    10.1016/j.cell.2022.11.005 703f00e6-b996-48e5-bc34-00c41b9876f4
#> 9    10.1016/j.jhep.2023.12.023 e347396c-a7ff-4691-9f7a-99a43555ca18
#> 10            10.1002/hep4.1854 524e045e-e74c-4e00-9884-d5c3bef3d862
#>                      dataset_version_id
#> 1  fb76c95f-0391-4fac-9fb9-082ce2430b59
#> 2  b6737a5e-9069-4dd6-9a57-92e17a746df9
#> 3  0e02290f-b992-450b-8a19-554f73cd7f09
#> 4  40832710-d7b1-43fb-b2c2-1cd2255bc3ac
#> 5  eb6c070c-ff67-4c1f-8d4d-65f9fe2119ee
#> 6  650a47be-6666-4f70-ac47-8414c50bbd8e
#> 7  1f0cd8ed-94c6-440c-bd5b-bad55e2666b1
#> 8  086c8f4e-3fe7-46e5-8b6b-a8cb7f92dadd
#> 9  7e88c4aa-d95f-43e3-bc8b-cf02629d7301
#> 10 575dd70a-45b1-4a82-8acd-467d313e4c66
#>                                                                                                dataset_title
#> 1                                                                               Healthy human liver: B cells
#> 2                                                                       Human tonsil nonlymphoid cells scRNA
#> 3  Molecular characterization of selectively vulnerable neurons in Alzheimer<U+2019>s Disease: SFG microglia
#> 4                                                                           Steady-state B cells - scRNA-seq
#> 5                                                           blood and bone marrow from a healthy young donor
#> 6                                                                                 Myeloid cells of human eye
#> 7   Molecular characterization of selectively vulnerable neurons in Alzheimer<U+2019>s Disease: EC microglia
#> 8                                                                                                        PNS
#> 9                                                      Stellate cells from human healthy donor liver samples
#> 10                                                               Healthy human liver: hepatic stellate cells
#>                            dataset_h5ad_path dataset_total_cell_count
#> 1  0895c838-e550-48a3-a777-dbcd35d30272.h5ad                      146
#> 2  00ff600e-6e2e-4d76-846f-0eec4f0ae417.h5ad                      363
#> 3  bdacc907-7c26-419f-8808-969eab3ca2e8.h5ad                     3799
#> 4  a5d95a42-0137-496f-8a60-101e17f263c8.h5ad                     1324
#> 5  d3566d6a-a455-4a15-980f-45eb29114cab.h5ad                    15502
#> 6  de17ac25-550a-4018-be75-bbb485a0636e.h5ad                      395
#> 7  9f1049ac-f8b7-45ad-8e31-6e96c3e5058f.h5ad                     5572
#> 8  703f00e6-b996-48e5-bc34-00c41b9876f4.h5ad                      649
#> 9  e347396c-a7ff-4691-9f7a-99a43555ca18.h5ad                     1417
#> 10 524e045e-e74c-4e00-9884-d5c3bef3d862.h5ad                     1374
#>  [ reached 'max' / getOption("max.print") -- omitted 802 rows ]

Fetching the dataset presence matrix

Now let’s fetch the dataset presence matrix.

For convenience, read the entire presence matrix (for Homo sapiens) into a sparse matrix. There is a convenience function providing this capability:

presence_matrix <- get_presence_matrix(census, "Homo sapiens", "RNA")
print(dim(presence_matrix))
#> NULL

We also need the var dataframe, which is read into an R data frame for convenient manipulation:

var_df <- as.data.frame(human_rna$var$read()$concat())
print(var_df)
#>    soma_joinid      feature_id feature_name feature_length      nnz n_measured_obs
#> 1            0 ENSG00000000003       TSPAN6           4530  4530448       73855064
#> 2            1 ENSG00000000005         TNMD           1476   236059       61201828
#> 3            2 ENSG00000000419         DPM1           9276 17576462       74159149
#> 4            3 ENSG00000000457        SCYL3           6883  9117322       73988868
#> 5            4 ENSG00000000460     C1orf112           5970  6287794       73636201
#> 6            5 ENSG00000000938          FGR           3382  5667858       74061500
#> 7            6 ENSG00000000971          CFH          15284  4423029       74095467
#> 8            7 ENSG00000001036        FUCA2           2822 10129388       73988868
#> 9            8 ENSG00000001084         GCLC           8618 12662710       74226365
#> 10           9 ENSG00000001167         NFYA           6209  4866873       73988868
#> 11          10 ENSG00000001460        STPG1           8511  6138613       74132995
#> 12          11 ENSG00000001461       NIPAL3           9396 13079400       74197924
#> 13          12 ENSG00000001497        LAS1L          12555 10716777       73858963
#> 14          13 ENSG00000001561        ENPP4           4644  9509567       74091113
#> 15          14 ENSG00000001617       SEMA3F           4826  2055855       72941475
#> 16          15 ENSG00000001626         CFTR          16821  5419122       71475983
#>  [ reached 'max' / getOption("max.print") -- omitted 60514 rows ]

Identifying genes measured in a specific dataset

Now that we have the dataset table, the genes metadata table, and the dataset presence matrix, we can check if a gene or set of genes were measured in a specific dataset.

Important: the presence matrix is indexed by soma_joinid, and is NOT positionally indexed. In other words:

the first dimension of the presence matrix is the dataset’s soma_joinid, as stored in the census_datasets dataframe.
the second dimension of the presence matrix is the feature’s soma_joinid, as stored in the var dataframe.

The presence matrix has a method $take() that lets you slice it by soma_joinids from census_datasets and var. And the full presence matrix, or slices of it, can then be exported to a regular matrix with the method $get_one_based_matrix()

Let’s find out if the the gene "ENSG00000286096" was measured in the dataset with id "97a17473-e2b1-4f31-a544-44a60773e2dd".

# Get soma_joinid for datasets and genes of interest
var_joinid <- var_df$soma_joinid[var_df$feature_id == "ENSG00000286096"]
dataset_joinid <- datasets_df$soma_joinid[datasets_df$dataset_id == "97a17473-e2b1-4f31-a544-44a60773e2dd"]

# Slice presence matrix with datasets and genes of interest
presence_matrix_slice <- presence_matrix$take(i = dataset_joinid, j = var_joinid)

# Convert presence matrix to regular matrix
presence_matrix_slice <- presence_matrix_slice$get_one_based_matrix()

# Find how if the gene is present in this dataset
is_present <- presence_matrix_slice[, , drop = TRUE]
cat(paste("Feature is", if (is_present) "present." else "not present."))
#> Feature is present.

Identifying datasets that measured specific genes

Similarly, we can determine the datasets that measured a specific gene or set of genes.

# Grab the feature's soma_joinid from the var dataframe
var_joinid <- var_df$soma_joinid[var_df$feature_id == "ENSG00000286096"]

# The presence matrix is indexed by the joinids of the dataset and var dataframes,
# so slice out the feature of interest by its joinid.
presence_matrix_slice <- presence_matrix$take(j = var_joinid)$get_one_based_matrix()
measured_datasets <- presence_matrix_slice[, , drop = TRUE] != 0
dataset_joinids <- datasets_df$soma_joinid[measured_datasets]

# From the datasets dataframe, slice out the datasets which have a joinid in the list
print(datasets_df[dataset_joinids, ])
#>     soma_joinid
#> 264         263
#> 272         271
#> 299         298
#> 302         301
#> 316         315
#> 335         334
#> 350         349
#> 352         351
#> 353         352
#> 367         366
#>                                                                                                                                                                                                                                                                                                           citation
#> 264  Publication: https://doi.org/10.1038/s41586-020-2496-1 Dataset Version: https://datasets.cellxgene.cziscience.com/a87a3135-3ce7-405e-bc07-b7eac2c09cc6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/0b9d8a04-bb9d-44da-aa27-705bb65b54eb
#> 272 Publication: https://doi.org/10.1016/j.cell.2021.12.018 Dataset Version: https://datasets.cellxgene.cziscience.com/80650a81-3bde-49cf-873b-548425184f7a.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/74e10dc4-cbb2-4605-a189-8a1cd8e44d8c
#> 299  Publication: https://doi.org/10.1101/2023.05.08.539485 Dataset Version: https://datasets.cellxgene.cziscience.com/5097433f-5cf0-4a5e-b252-b9f012e76d46.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/1ca90a2d-2943-483d-b678-b809bf464c30
#> 302  Publication: https://doi.org/10.1101/2023.05.08.539485 Dataset Version: https://datasets.cellxgene.cziscience.com/d8df87f4-279a-4e85-a192-b3fe0df3143e.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/1ca90a2d-2943-483d-b678-b809bf464c30
#> 316 Publication: https://doi.org/10.1016/j.cell.2020.08.013 Dataset Version: https://datasets.cellxgene.cziscience.com/74a8e0fe-08ee-4a93-9c2f-9cb70acd714b.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/2f4c738f-e2f3-4553-9db2-0582a38ea4dc
#> 335 Publication: https://doi.org/10.1016/j.cell.2021.07.023 Dataset Version: https://datasets.cellxgene.cziscience.com/837efefb-05c5-4dfc-9e02-17842c1880f2.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/35d0b748-3eed-43a5-a1c4-1dade5ec5ca0
#> 350  Publication: https://doi.org/10.1038/s41586-020-2797-4 Dataset Version: https://datasets.cellxgene.cziscience.com/0f5dba64-8621-420f-a404-2f6836cbe1ce.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/b52eb423-5d0d-4645-b217-e1c6d38b2e72
#> 352  Publication: https://doi.org/10.1038/s41586-020-2496-1 Dataset Version: https://datasets.cellxgene.cziscience.com/1bd1f7fb-5efe-4886-abc5-aa08bf218a83.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/0b9d8a04-bb9d-44da-aa27-705bb65b54eb
#> 353    Publication: https://doi.org/10.1126/science.add7046 Dataset Version: https://datasets.cellxgene.cziscience.com/fb6148b1-20ab-4e0e-bb6d-e5d8bb2147f2.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/283d65eb-dd53-496d-adb7-7570c7caa443
#> 367    Publication: https://doi.org/10.1126/science.add7046 Dataset Version: https://datasets.cellxgene.cziscience.com/628158bf-037c-4198-8e4e-b97e2675506b.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/283d65eb-dd53-496d-adb7-7570c7caa443
#>                            collection_id
#> 264 0b9d8a04-bb9d-44da-aa27-705bb65b54eb
#> 272 74e10dc4-cbb2-4605-a189-8a1cd8e44d8c
#> 299 1ca90a2d-2943-483d-b678-b809bf464c30
#> 302 1ca90a2d-2943-483d-b678-b809bf464c30
#> 316 2f4c738f-e2f3-4553-9db2-0582a38ea4dc
#> 335 35d0b748-3eed-43a5-a1c4-1dade5ec5ca0
#> 350 b52eb423-5d0d-4645-b217-e1c6d38b2e72
#> 352 0b9d8a04-bb9d-44da-aa27-705bb65b54eb
#> 353 283d65eb-dd53-496d-adb7-7570c7caa443
#> 367 283d65eb-dd53-496d-adb7-7570c7caa443
#>                                                                                    collection_name
#> 264                                                                             Tabula Muris Senis
#> 272 Spatial proteogenomics reveals distinct and evolutionarily conserved hepatic macrophage niches
#> 299                                    SEA-AD: Seattle Alzheimer<U+2019>s Disease Brain Cell Atlas
#> 302                                    SEA-AD: Seattle Alzheimer<U+2019>s Disease Brain Cell Atlas
#> 316                     Cell Types of the Human Retina and Its Organoids at Single-Cell Resolution
#> 335                   Impaired local intrinsic immunity to SARS-CoV-2 infection in severe COVID-19
#> 350                                                                 Cells of the adult human heart
#> 352                                                                             Tabula Muris Senis
#> 353                                                                    Human Brain Cell Atlas v1.0
#> 367                                                                    Human Brain Cell Atlas v1.0
#>                 collection_doi                           dataset_id
#> 264  10.1038/s41586-020-2496-1 93966790-bbfa-420f-aa85-bc5ca51d9c96
#> 272 10.1016/j.cell.2021.12.018 d1cbed97-d88f-4954-8925-13302fe30b39
#> 299  10.1101/2023.05.08.539485 f67f2cfa-ba45-4f77-8e26-64a15f666043
#> 302  10.1101/2023.05.08.539485 8c6ed88f-11bf-4159-bad7-ff41fb1e1eca
#> 316 10.1016/j.cell.2020.08.013 babbf9f3-f482-45a5-ba76-c41f4c2f503e
#> 335 10.1016/j.cell.2021.07.023 13b61a7d-5605-4948-ba48-02c588960143
#> 350  10.1038/s41586-020-2797-4 9d584fcb-a28a-4b91-a886-ceb66a88ef81
#> 352  10.1038/s41586-020-2496-1 0fb7916e-7a68-4a4c-a441-3ab3989f29a7
#> 353    10.1126/science.add7046 93131426-0124-4ab4-a013-9dfbcd99d467
#> 367    10.1126/science.add7046 7d3ab174-e433-40fc-a352-6fe71b1a19f9
#>                       dataset_version_id
#> 264 a87a3135-3ce7-405e-bc07-b7eac2c09cc6
#> 272 80650a81-3bde-49cf-873b-548425184f7a
#> 299 5097433f-5cf0-4a5e-b252-b9f012e76d46
#> 302 d8df87f4-279a-4e85-a192-b3fe0df3143e
#> 316 74a8e0fe-08ee-4a93-9c2f-9cb70acd714b
#> 335 837efefb-05c5-4dfc-9e02-17842c1880f2
#> 350 0f5dba64-8621-420f-a404-2f6836cbe1ce
#> 352 1bd1f7fb-5efe-4886-abc5-aa08bf218a83
#> 353 fb6148b1-20ab-4e0e-bb6d-e5d8bb2147f2
#> 367 628158bf-037c-4198-8e4e-b97e2675506b
#>                                                                                          dataset_title
#> 264 Bladder lumen - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - 10x
#> 272                                                       CD45 negative cells from human liver dataset
#> 299                                             Pax6 - MTG: Seattle Alzheimer's Disease Atlas (SEA-AD)
#> 302                                       Chandelier - MTG: Seattle Alzheimer's Disease Atlas (SEA-AD)
#> 316                 Fovea - Cell Types of the Human Retina and Its Organoids at Single-Cell Resolution
#> 335                                                                                        Nasopharynx
#> 350                                                Fibroblasts <U+2014> Cells of the adult human heart
#> 352          Lung - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - 10x
#> 353                                                                      Dissection: Epithalamus - ETH
#> 367      Dissection: Amygdaloid complex (AMY) - Basolateral nuclear group (BLN) - lateral nucleus - La
#>                             dataset_h5ad_path dataset_total_cell_count
#> 264 93966790-bbfa-420f-aa85-bc5ca51d9c96.h5ad                     8945
#> 272 d1cbed97-d88f-4954-8925-13302fe30b39.h5ad                    15481
#> 299 f67f2cfa-ba45-4f77-8e26-64a15f666043.h5ad                     8463
#> 302 8c6ed88f-11bf-4159-bad7-ff41fb1e1eca.h5ad                     9893
#> 316 babbf9f3-f482-45a5-ba76-c41f4c2f503e.h5ad                    19768
#> 335 13b61a7d-5605-4948-ba48-02c588960143.h5ad                    32588
#> 350 9d584fcb-a28a-4b91-a886-ceb66a88ef81.h5ad                    59341
#> 352 0fb7916e-7a68-4a4c-a441-3ab3989f29a7.h5ad                    24540
#> 353 93131426-0124-4ab4-a013-9dfbcd99d467.h5ad                    24327
#> 367 7d3ab174-e433-40fc-a352-6fe71b1a19f9.h5ad                    28984
#>  [ reached 'max' / getOption("max.print") -- omitted 32 rows ]

Identifying all genes measured in a dataset

Finally, we can find the set of genes that were measured in the cells of a given dataset.

# Slice the dataset(s) of interest, and get the joinid(s)
dataset_joinids <- datasets_df$soma_joinid[datasets_df$collection_id == "17481d16-ee44-49e5-bcf0-28c0780d8c4a"]

# Slice the presence matrix by the first dimension, i.e., by dataset
presence_matrix_slice <- presence_matrix$take(i = dataset_joinids)$get_one_based_matrix()
genes_measured <- Matrix::colSums(presence_matrix_slice) > 0
var_joinids <- var_df$soma_joinid[genes_measured]

print(var_df[var_joinids, ])
#>    soma_joinid      feature_id feature_name feature_length      nnz n_measured_obs
#> 1            0 ENSG00000000003       TSPAN6           4530  4530448       73855064
#> 2            1 ENSG00000000005         TNMD           1476   236059       61201828
#> 3            2 ENSG00000000419         DPM1           9276 17576462       74159149
#> 4            3 ENSG00000000457        SCYL3           6883  9117322       73988868
#> 5            4 ENSG00000000460     C1orf112           5970  6287794       73636201
#> 6            5 ENSG00000000938          FGR           3382  5667858       74061500
#> 7            6 ENSG00000000971          CFH          15284  4423029       74095467
#> 8            7 ENSG00000001036        FUCA2           2822 10129388       73988868
#> 9            8 ENSG00000001084         GCLC           8618 12662710       74226365
#> 10           9 ENSG00000001167         NFYA           6209  4866873       73988868
#> 11          10 ENSG00000001460        STPG1           8511  6138613       74132995
#> 12          11 ENSG00000001461       NIPAL3           9396 13079400       74197924
#> 13          12 ENSG00000001497        LAS1L          12555 10716777       73858963
#> 14          13 ENSG00000001561        ENPP4           4644  9509567       74091113
#> 15          14 ENSG00000001617       SEMA3F           4826  2055855       72941475
#> 16          15 ENSG00000001626         CFTR          16821  5419122       71475983
#>  [ reached 'max' / getOption("max.print") -- omitted 27130 rows ]

Close the census

After use, the census object should be closed to release memory and other resources.

census$close()

This also closes all SOMA objects accessed via the top-level census. Closing can be automated using on.exit(census$close(), add = TRUE) immediately after census <- open_soma().