czbenchmarks.datasets.single_cell
Attributes
Classes
Abstract base class for single cell datasets containing gene expression data. |
Module Contents
- czbenchmarks.datasets.single_cell.logger
- class czbenchmarks.datasets.single_cell.SingleCellDataset(dataset_type_name: str, path: pathlib.Path, organism: czbenchmarks.datasets.types.Organism, task_inputs_dir: pathlib.Path | None = None)[source]
Bases:
czbenchmarks.datasets.dataset.Dataset
Abstract base class for single cell datasets containing gene expression data.
Handles loading and validation of AnnData objects with the following requirements: - Must have gene names in adata.var[‘ensembl_id’] or adata.var_names. - Gene names must start with the organism prefix (e.g., “ENSG” for human). - Must contain raw counts in adata.X (non-negative integers). - Should be stored in H5AD format.
- adata
Loaded AnnData object containing gene expression data.
- Type:
ad.AnnData
Initialize a SingleCellDataset instance.
- Parameters:
- adata: anndata.AnnData
- load_data(backed: Literal['r', 'r+'] | bool | None = None) None [source]
Load the dataset from the path.
This method reads the dataset file in H5AD format and loads it into the adata attribute as an AnnData object.
- Parameters:
backed (Literal['r', 'r+'] | bool | None) – Whether to load the dataset into memory or use backed mode. Memory: False or None. Default is None. Backed: True, ‘r’ for read-only, ‘r+’ for read-write
- Populates:
adata (ad.AnnData): Loaded AnnData object containing gene expression data.