Genes measured in each cell (dataset presence matrix)
The Census is a compilation of cells from multiple datasets that may differ by the sets of genes they measure. This notebook describes the way to identify the genes measured per dataset.
The presence matrix is a sparse boolean array, indicating which features (var) were present in each dataset. The array has dimensions [n_datasets, n_var], and is stored in the SOMA Measurement varp
collection. The first dimension is indexed by the soma_joinid
in the census_datasets
dataframe. The second is indexed by the soma_joinid
in the var
dataframe of the measurement.
As a reminder the obs
data frame has a column dataset_id
that can be used to link any cell in the Census to the presence matrix.
Contents
Opening the Census.
Fetching the IDs of the Census datasets.
Fetching the dataset presence matrix.
Identifying genes measured in a specific dataset.
Identifying datasets that measured specific genes.
Identifying all genes measured in a dataset.
⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable is_primary_data
which is described in the Census schema.
Opening the Census
The cellxgene_census
python package contains a convenient API to open the latest version of the Census.
[1]:
import cellxgene_census
census = cellxgene_census.open_soma()
The "stable" release is currently 2023-07-25. Specify 'census_version="2023-07-25"' in future calls to open_soma() to ensure data consistency.
Fetching the IDs of the Census datasets
Let’s grab a table of all the datasets included in the Census and use this table in combination with the presence matrix below.
[2]:
# Grab the experiment containing human data, and the measurement therein with RNA
human = census["census_data"]["homo_sapiens"]
human_rna = human.ms["RNA"]
# The census-wide datasets
datasets_df = census["census_info"]["datasets"].read().concat().to_pandas()
datasets_df
[2]:
soma_joinid | collection_id | collection_name | collection_doi | dataset_id | dataset_title | dataset_h5ad_path | dataset_total_cell_count | |
---|---|---|---|---|---|---|---|---|
0 | 0 | e2c257e7-6f79-487c-b81c-39451cd4ab3c | Spatial multiomics map of trophoblast developm... | 10.1038/s41586-023-05869-0 | f171db61-e57e-4535-a06a-35d8b6ef8f2b | donor_p13_trophoblasts | f171db61-e57e-4535-a06a-35d8b6ef8f2b.h5ad | 31497 |
1 | 1 | e2c257e7-6f79-487c-b81c-39451cd4ab3c | Spatial multiomics map of trophoblast developm... | 10.1038/s41586-023-05869-0 | ecf2e08e-2032-4a9e-b466-b65b395f4a02 | All donors trophoblasts | ecf2e08e-2032-4a9e-b466-b65b395f4a02.h5ad | 67070 |
2 | 2 | e2c257e7-6f79-487c-b81c-39451cd4ab3c | Spatial multiomics map of trophoblast developm... | 10.1038/s41586-023-05869-0 | 74cff64f-9da9-4b2a-9b3b-8a04a1598040 | All donors all cell states (in vivo) | 74cff64f-9da9-4b2a-9b3b-8a04a1598040.h5ad | 286326 |
3 | 3 | f7cecffa-00b4-4560-a29a-8ad626b8ee08 | Mapping single-cell transcriptomes in the intr... | 10.1016/j.ccell.2022.11.001 | 5af90777-6760-4003-9dba-8f945fec6fdf | Single-cell transcriptomic datasets of Renal c... | 5af90777-6760-4003-9dba-8f945fec6fdf.h5ad | 270855 |
4 | 4 | 3f50314f-bdc9-40c6-8e4a-b0901ebfbe4c | Single-cell sequencing links multiregional imm... | 10.1016/j.ccell.2021.03.007 | bd65a70f-b274-4133-b9dd-0d1431b6af34 | Single-cell sequencing links multiregional imm... | bd65a70f-b274-4133-b9dd-0d1431b6af34.h5ad | 167283 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
588 | 588 | 180bff9c-c8a5-4539-b13b-ddbc00d643e6 | Molecular characterization of selectively vuln... | 10.1038/s41593-020-00764-7 | f9ad5649-f372-43e1-a3a8-423383e5a8a2 | Molecular characterization of selectively vuln... | f9ad5649-f372-43e1-a3a8-423383e5a8a2.h5ad | 8168 |
589 | 589 | a72afd53-ab92-4511-88da-252fb0e26b9a | Single-cell atlas of peripheral immune respons... | 10.1038/s41591-020-0944-y | 456e8b9b-f872-488b-871d-94534090a865 | Single-cell atlas of peripheral immune respons... | 456e8b9b-f872-488b-871d-94534090a865.h5ad | 44721 |
590 | 590 | 38833785-fac5-48fd-944a-0f62a4c23ed1 | Construction of a human cell landscape at sing... | 10.1038/s41586-020-2157-4 | 2adb1f8a-a6b1-4909-8ee8-484814e2d4bf | Construction of a human cell landscape at sing... | 2adb1f8a-a6b1-4909-8ee8-484814e2d4bf.h5ad | 598266 |
591 | 591 | 5d445965-6f1a-4b68-ba3a-b8f765155d3a | A molecular cell atlas of the human lung from ... | 10.1038/s41586-020-2922-4 | e04daea4-4412-45b5-989e-76a9be070a89 | Krasnow Lab Human Lung Cell Atlas, Smart-seq2 | e04daea4-4412-45b5-989e-76a9be070a89.h5ad | 9409 |
592 | 592 | 5d445965-6f1a-4b68-ba3a-b8f765155d3a | A molecular cell atlas of the human lung from ... | 10.1038/s41586-020-2922-4 | 8c42cfd0-0b0a-46d5-910c-fc833d83c45e | Krasnow Lab Human Lung Cell Atlas, 10X | 8c42cfd0-0b0a-46d5-910c-fc833d83c45e.h5ad | 65662 |
593 rows × 8 columns
Fetching the dataset presence matrix
Now let’s fetch the dataset presence matrix.
For convenience, read the entire presence matrix (for Homo sapiens) into a SciPy array. There is a convenience API providing this capability, returning the matrix in a scipy.sparse.array
.
[3]:
presence_matrix = cellxgene_census.get_presence_matrix(census, organism="Homo sapiens", measurement_name="RNA")
presence_matrix
[3]:
<593x60664 sparse matrix of type '<class 'numpy.uint8'>'
with 16133717 stored elements in Compressed Sparse Row format>
We also need the var
dataframe, which is read into a Pandas DataFrame for convenient manipulation:
[4]:
var_df = human_rna.var.read().concat().to_pandas()
var_df
[4]:
soma_joinid | feature_id | feature_name | feature_length | |
---|---|---|---|---|
0 | 0 | ENSG00000121410 | A1BG | 3999 |
1 | 1 | ENSG00000268895 | A1BG-AS1 | 3374 |
2 | 2 | ENSG00000148584 | A1CF | 9603 |
3 | 3 | ENSG00000175899 | A2M | 6318 |
4 | 4 | ENSG00000245105 | A2M-AS1 | 2948 |
... | ... | ... | ... | ... |
60659 | 60659 | ENSG00000288719 | RP4-669P10.21 | 4252 |
60660 | 60660 | ENSG00000288720 | RP11-852E15.3 | 7007 |
60661 | 60661 | ENSG00000288721 | RP5-973N23.5 | 7765 |
60662 | 60662 | ENSG00000288723 | RP11-553N16.6 | 1015 |
60663 | 60663 | ENSG00000288724 | RP13-546I2.2 | 625 |
60664 rows × 4 columns
Identifying genes measured in a specific dataset.
Now that we have the dataset table, the genes metadata table, and the dataset presence matrix, we can check if a gene or set of genes were measured in a specific dataset.
Important: the presence matrix is indexed by soma_joinid, and is NOT positionally indexed. In other words:
the first dimension of the presence matrix is the dataset’s
soma_joinid
, as stored in thecensus_datasets
dataframe.the second dimension of the presence matrix is the feature’s
soma_joinid
, as stored in thevar
dataframe.
Let’s find out if the the gene "ENSG00000286096"
was measured in the dataset with id "97a17473-e2b1-4f31-a544-44a60773e2dd"
.
[5]:
var_joinid = var_df.loc[var_df.feature_id == "ENSG00000286096"].soma_joinid
dataset_joinid = datasets_df.loc[datasets_df.dataset_id == "97a17473-e2b1-4f31-a544-44a60773e2dd"].soma_joinid
is_present = presence_matrix[dataset_joinid, var_joinid][0, 0]
print(f'Feature is {"present" if is_present else "not present"}.')
Feature is present.
Identifying datasets that measured specific genes
Similarly, we can determine the datasets that measured a specific gene or set of genes.
[6]:
# Grab the feature's soma_joinid from the var dataframe
var_joinid = var_df.loc[var_df.feature_id == "ENSG00000286096"].soma_joinid
# The presence matrix is indexed by the joinids of the dataset and var dataframes,
# so slice out the feature of interest by its joinid.
dataset_joinids = presence_matrix[:, var_joinid].tocoo().row
# From the datasets dataframe, slice out the datasets which have a joinid in the list
datasets_df.loc[datasets_df.soma_joinid.isin(dataset_joinids)]
[6]:
soma_joinid | collection_id | collection_name | collection_doi | dataset_id | dataset_title | dataset_h5ad_path | dataset_total_cell_count | |
---|---|---|---|---|---|---|---|---|
5 | 5 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | ff45e623-7f5f-46e3-b47d-56be0341f66b | Tabula Sapiens - Pancreas | ff45e623-7f5f-46e3-b47d-56be0341f66b.h5ad | 13497 |
6 | 6 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | f01bdd17-4902-40f5-86e3-240d66dd2587 | Tabula Sapiens - Salivary_Gland | f01bdd17-4902-40f5-86e3-240d66dd2587.h5ad | 27199 |
7 | 7 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | e6a11140-2545-46bc-929e-da243eed2cae | Tabula Sapiens - Heart | e6a11140-2545-46bc-929e-da243eed2cae.h5ad | 11505 |
8 | 8 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | e5c63d94-593c-4338-a489-e1048599e751 | Tabula Sapiens - Bladder | e5c63d94-593c-4338-a489-e1048599e751.h5ad | 24583 |
9 | 9 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | d8732da6-8d1d-42d9-b625-f2416c30054b | Tabula Sapiens - Trachea | d8732da6-8d1d-42d9-b625-f2416c30054b.h5ad | 9522 |
11 | 11 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | cee11228-9f0b-4e57-afe2-cfe15ee56312 | Tabula Sapiens - Spleen | cee11228-9f0b-4e57-afe2-cfe15ee56312.h5ad | 34004 |
12 | 12 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | a357414d-2042-4eb5-95f0-c58604a18bdd | Tabula Sapiens - Small_Intestine | a357414d-2042-4eb5-95f0-c58604a18bdd.h5ad | 12467 |
14 | 14 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | a0754256-f44b-4c4a-962c-a552e47d3fdc | Tabula Sapiens - Eye | a0754256-f44b-4c4a-962c-a552e47d3fdc.h5ad | 10650 |
15 | 15 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | 983d5ec9-40e8-4512-9e65-a572a9c486cb | Tabula Sapiens - Blood | 983d5ec9-40e8-4512-9e65-a572a9c486cb.h5ad | 50115 |
19 | 19 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | 5e5e7a2f-8f1c-42ac-90dc-b4f80f38e84c | Tabula Sapiens - Fat | 5e5e7a2f-8f1c-42ac-90dc-b4f80f38e84c.h5ad | 20263 |
20 | 20 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | 55cf0ea3-9d2b-4294-871e-bb4b49a79fc7 | Tabula Sapiens - Tongue | 55cf0ea3-9d2b-4294-871e-bb4b49a79fc7.h5ad | 15020 |
21 | 21 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | 4f1555bc-4664-46c3-a606-78d34dd10d92 | Tabula Sapiens - Bone_Marrow | 4f1555bc-4664-46c3-a606-78d34dd10d92.h5ad | 12297 |
23 | 23 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | 2423ce2c-3149-4cca-a2ff-cf682ea29b5f | Tabula Sapiens - Kidney | 2423ce2c-3149-4cca-a2ff-cf682ea29b5f.h5ad | 9641 |
24 | 24 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | 1c9eb291-6d31-47e1-96b2-129b5e1ae64f | Tabula Sapiens - Muscle | 1c9eb291-6d31-47e1-96b2-129b5e1ae64f.h5ad | 30746 |
25 | 25 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | 18eb630b-a754-4111-8cd4-c24ec80aa5ec | Tabula Sapiens - Lymph_Node | 18eb630b-a754-4111-8cd4-c24ec80aa5ec.h5ad | 53275 |
26 | 26 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | 0d2ee4ac-05ee-40b2-afb6-ebb584caa867 | Tabula Sapiens - Lung | 0d2ee4ac-05ee-40b2-afb6-ebb584caa867.h5ad | 35682 |
27 | 27 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | 0ced5e76-6040-47ff-8a72-93847965afc0 | Tabula Sapiens - Thymus | 0ced5e76-6040-47ff-8a72-93847965afc0.h5ad | 33664 |
43 | 43 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | 8e10f1c4-8e98-41e5-b65f-8cd89a887122 | All neurons | 8e10f1c4-8e98-41e5-b65f-8cd89a887122.h5ad | 2480956 |
139 | 139 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | fe1a73ab-a203-45fd-84e9-0f7fd19efcbd | Dissection: Amygdaloid complex (AMY) - basolat... | fe1a73ab-a203-45fd-84e9-0f7fd19efcbd.h5ad | 35285 |
143 | 143 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | f8dda921-5fb4-4c94-a654-c6fc346bfd6d | Dissection: Cerebral cortex (Cx) - Occipitotem... | f8dda921-5fb4-4c94-a654-c6fc346bfd6d.h5ad | 31899 |
160 | 160 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | dd03ce70-3243-4c96-9561-330cc461e4d7 | Dissection: Cerebral cortex (Cx) - Perirhinal ... | dd03ce70-3243-4c96-9561-330cc461e4d7.h5ad | 23732 |
165 | 165 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | d2b5efc1-14c6-4b5f-bd98-40f9084872d7 | Dissection: Tail of Hippocampus (HiT) - Caudal... | d2b5efc1-14c6-4b5f-bd98-40f9084872d7.h5ad | 36886 |
175 | 175 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | c4b03352-af8d-492a-8d6b-40f304e0a122 | Supercluster: Medium spiny neuron | c4b03352-af8d-492a-8d6b-40f304e0a122.h5ad | 152189 |
176 | 176 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | c2aad8fc-b63b-4f9b-9cfd-baf7bc9c1771 | Dissection: Cerebral cortex (Cx) - Temporal po... | c2aad8fc-b63b-4f9b-9cfd-baf7bc9c1771.h5ad | 37642 |
177 | 177 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | c202b243-1aa1-4b16-bc9a-b36241f3b1e3 | Supercluster: Amygdala excitatory | c202b243-1aa1-4b16-bc9a-b36241f3b1e3.h5ad | 109452 |
178 | 178 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | bdb26abd-f4ba-4ea3-8862-c2340e7a4f55 | Supercluster: CGE interneuron | bdb26abd-f4ba-4ea3-8862-c2340e7a4f55.h5ad | 227671 |
183 | 183 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | acae7679-d077-461c-b857-ee6ccfeb267f | Dissection: Head of hippocampus (HiH) - CA1 | acae7679-d077-461c-b857-ee6ccfeb267f.h5ad | 39147 |
196 | 196 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | 9372df2d-13d6-4fac-980b-919a5b7eb483 | Dissection: Midbrain (M) - Periaqueductal gray... | 9372df2d-13d6-4fac-980b-919a5b7eb483.h5ad | 33794 |
197 | 197 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | 93131426-0124-4ab4-a013-9dfbcd99d467 | Dissection: Epithalamus - ETH | 93131426-0124-4ab4-a013-9dfbcd99d467.h5ad | 24327 |
206 | 206 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | 7c1c3d47-3166-43e5-9a95-65ceb2d45f78 | Dissection: Pons (Pn) - Pontine reticular form... | 7c1c3d47-3166-43e5-9a95-65ceb2d45f78.h5ad | 49512 |
208 | 208 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | 7a0a8891-9a22-4549-a55b-c2aca23c3a2a | Supercluster: Hippocampal CA1-3 | 7a0a8891-9a22-4549-a55b-c2aca23c3a2a.h5ad | 74979 |
220 | 220 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | 5e5ab909-f73f-4b57-98a0-6d2c5662f6a4 | Dissection: Midbrain (M) - Inferior colliculus... | 5e5ab909-f73f-4b57-98a0-6d2c5662f6a4.h5ad | 32306 |
243 | 243 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | 3f56901c-dd4a-47d6-b60b-7b0c0111cfb2 | Dissection: Head of hippocampus (HiH) - CA1-3 | 3f56901c-dd4a-47d6-b60b-7b0c0111cfb2.h5ad | 37911 |
245 | 245 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | 3a7f3ab4-a280-4b3b-b2c0-6dd05614a78c | Supercluster: Splatter | 3a7f3ab4-a280-4b3b-b2c0-6dd05614a78c.h5ad | 291833 |
249 | 249 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | 35c8a04c-8639-4d15-8228-765d8d93fc96 | Dissection: Hypothalamus (HTH) - supraoptic re... | 35c8a04c-8639-4d15-8228-765d8d93fc96.h5ad | 16753 |
270 | 270 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | 07b1d7c8-5c2e-42f7-9246-26f746cd6013 | Dissection: Myelencephalon (medulla oblongata)... | 07b1d7c8-5c2e-42f7-9246-26f746cd6013.h5ad | 27210 |
273 | 273 | 283d65eb-dd53-496d-adb7-7570c7caa443 | Transcriptomic diversity of cell types across ... | 10.1101/2022.10.12.511898 | 0325478a-9b52-45b5-b40a-2e2ab0d72eb1 | Supercluster: Upper-layer intratelencephalic | 0325478a-9b52-45b5-b40a-2e2ab0d72eb1.h5ad | 455006 |
475 | 475 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | 53d208b0-2cfd-4366-9866-c3c6114081bc | Tabula Sapiens - All Cells | 53d208b0-2cfd-4366-9866-c3c6114081bc.h5ad | 483152 |
476 | 476 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | a68b64d8-aee3-4947-81b7-36b8fe5a44d2 | Tabula Sapiens - Stromal | a68b64d8-aee3-4947-81b7-36b8fe5a44d2.h5ad | 82478 |
477 | 477 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | c5d88abe-f23a-45fa-a534-788985e93dad | Tabula Sapiens - Immune | c5d88abe-f23a-45fa-a534-788985e93dad.h5ad | 264824 |
478 | 478 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | 5a11f879-d1ef-458a-910c-9b0bdfca5ebf | Tabula Sapiens - Endothelial | 5a11f879-d1ef-458a-910c-9b0bdfca5ebf.h5ad | 31691 |
479 | 479 | e5f58829-1a66-40b5-a624-9046778e74f5 | Tabula Sapiens | 10.1126/science.abl4896 | 97a17473-e2b1-4f31-a544-44a60773e2dd | Tabula Sapiens - Epithelial | 97a17473-e2b1-4f31-a544-44a60773e2dd.h5ad | 104148 |
Identifying all genes measured in a dataset
Finally, we can find the set of genes that were measured in the cells of a given dataset.
[7]:
# Slice the dataset(s) of interest, and get the joinid(s)
dataset_joinids = datasets_df.loc[datasets_df.collection_id == "17481d16-ee44-49e5-bcf0-28c0780d8c4a"].soma_joinid
# Slice the presence matrix by the first dimension, i.e., by dataset
var_joinids = presence_matrix[dataset_joinids, :].tocoo().col
# From the feature (var) dataframe, slice out features which have a joinid in the list.
var_df.loc[var_df.soma_joinid.isin(var_joinids)]
[7]:
soma_joinid | feature_id | feature_name | feature_length | |
---|---|---|---|---|
0 | 0 | ENSG00000121410 | A1BG | 3999 |
1 | 1 | ENSG00000268895 | A1BG-AS1 | 3374 |
2 | 2 | ENSG00000148584 | A1CF | 9603 |
3 | 3 | ENSG00000175899 | A2M | 6318 |
4 | 4 | ENSG00000245105 | A2M-AS1 | 2948 |
... | ... | ... | ... | ... |
58109 | 58109 | ENSG00000277745 | H2AB3 | 591 |
58354 | 58354 | ENSG00000233522 | FAM224A | 2031 |
58411 | 58411 | ENSG00000183146 | PRORY | 878 |
58523 | 58523 | ENSG00000279274 | RP11-533E23.2 | 75 |
58632 | 58632 | ENSG00000277836 | ENSG00000277836.1 | 288 |
27211 rows × 4 columns