czbenchmarks.datasets.single_cell

Attributes

logger

Classes

SingleCellDataset

Abstract base class for single cell datasets containing gene expression data.

Module Contents

czbenchmarks.datasets.single_cell.logger
class czbenchmarks.datasets.single_cell.SingleCellDataset(dataset_type_name: str, path: pathlib.Path, organism: czbenchmarks.datasets.types.Organism, task_inputs_dir: pathlib.Path | None = None)[source]

Bases: czbenchmarks.datasets.dataset.Dataset

Abstract base class for single cell datasets containing gene expression data.

Handles loading and validation of AnnData objects with the following requirements: - Must have gene names in adata.var[‘ensembl_id’] or adata.var_names. - Gene names must start with the organism prefix (e.g., “ENSG” for human). - Must contain raw counts in adata.X (non-negative integers). - Should be stored in H5AD format.

adata

Loaded AnnData object containing gene expression data.

Type:

ad.AnnData

Initialize a SingleCellDataset instance.

Parameters:
  • dataset_type_name (str) – Name of the dataset type (used for directory naming).

  • path (Path) – Path to the dataset file.

  • organism (Organism) – Enum value indicating the organism.

  • task_inputs_dir (Optional[Path]) – Directory for storing task-specific inputs.

adata: anndata.AnnData
load_data(backed: Literal['r', 'r+'] | bool | None = None) None[source]

Load the dataset from the path.

This method reads the dataset file in H5AD format and loads it into the adata attribute as an AnnData object.

Parameters:

backed (Literal['r', 'r+'] | bool | None) – Whether to load the dataset into memory or use backed mode. Memory: False or None. Default is None. Backed: True, ‘r’ for read-only, ‘r+’ for read-write

Populates:

adata (ad.AnnData): Loaded AnnData object containing gene expression data.