czbenchmarks.datasets.single_cell_labeled

Attributes

logger

Classes

SingleCellLabeledDataset

Single cell dataset containing gene expression data and a label column.

Module Contents

czbenchmarks.datasets.single_cell_labeled.logger
class czbenchmarks.datasets.single_cell_labeled.SingleCellLabeledDataset(path: pathlib.Path, organism: czbenchmarks.datasets.types.Organism, label_column_key: str = 'cell_type', task_inputs_dir: pathlib.Path | None = None)[source]

Bases: czbenchmarks.datasets.single_cell.SingleCellDataset

Single cell dataset containing gene expression data and a label column.

This class extends SingleCellDataset to include a label column that contains the expected prediction values for each cell. The labels are extracted from the specified column in adata.obs and stored as a pd.Series in the labels attribute.

labels

Extracted labels for each cell.

Type:

pd.Series

label_column_key

Key for the column in adata.obs containing the labels.

Type:

str

Initialize a SingleCellLabeledDataset instance.

Parameters:
  • path (Path) – Path to the dataset file.

  • organism (Organism) – Enum value indicating the organism.

  • label_column_key (str) – Key for the column in adata.obs containing the labels. Defaults to “cell_type”.

  • task_inputs_dir (Optional[Path]) – Directory for storing task-specific inputs.

labels: pandas.Series
label_column_key: str
load_data() None[source]

Load the dataset and extract labels.

This method loads the dataset using the parent class’s load_data method and extracts the labels from the specified column in adata.obs.

Populates:

labels (pd.Series): Extracted labels for each cell.

store_task_inputs() pathlib.Path[source]

Store task-specific inputs, such as cell type annotations.

This method stores the extracted labels in a JSON file. The filename is dynamically generated based on the label_column_key.

Returns:

Path to the directory storing the task input files.

Return type:

Path