czbenchmarks.datasets.single_cell
=================================

.. py:module:: czbenchmarks.datasets.single_cell


Attributes
----------

.. autoapisummary::

   czbenchmarks.datasets.single_cell.logger


Classes
-------

.. autoapisummary::

   czbenchmarks.datasets.single_cell.SingleCellDataset
   czbenchmarks.datasets.single_cell.PerturbationSingleCellDataset


Module Contents
---------------

.. py:data:: logger

.. py:class:: SingleCellDataset(path: str, organism: czbenchmarks.datasets.types.Organism)

   Bases: :py:obj:`czbenchmarks.datasets.base.BaseDataset`


   Single cell dataset containing gene expression data and metadata.

   Handles loading and validation of AnnData objects with gene expression data
   and associated metadata for a specific organism.


   .. py:method:: load_data() -> None

      Load the dataset into memory.

      This method should be implemented by subclasses to load their specific
      data format.
      For example, SingleCellDataset loads an AnnData object from an h5ad
      file.

      The loaded data should be stored as instance attributes that can be
      accessed by other methods.


   .. py:method:: unload_data() -> None

      Unload the dataset from memory.

      This method should be implemented by subclasses to free memory by
      clearing loaded data.
      For example, SingleCellDataset sets its AnnData object to None.

      This is used to clear memory-intensive data before serialization,
      since serializing large raw data artifacts can be error-prone and
      inefficient.

      Any instance attributes containing loaded data should be cleared or
      set to None.


   .. py:property:: organism
      :type: czbenchmarks.datasets.types.Organism


   .. py:property:: adata
      :type: anndata.AnnData


.. py:class:: PerturbationSingleCellDataset(path: str, organism: czbenchmarks.datasets.types.Organism, condition_key: str = 'condition', split_key: str = 'split')

   Bases: :py:obj:`SingleCellDataset`


   Single cell dataset with perturbation data, containing control and
   perturbed cells.

   Input data requirements:

   - H5AD file containing single cell gene expression data
   - Must have a condition column in adata.obs specifying control ("ctrl") and
     perturbed conditions.
   - Must have a split column in adata.obs to identify test samples
   - Condition format must be one of:

     - ``ctrl`` for control samples
     - ``{gene}+ctrl`` for single gene perturbations
     - ``{gene1}+{gene2}`` for combinatorial perturbations


   .. py:method:: load_data() -> None

      Load the dataset into memory.

      This method should be implemented by subclasses to load their specific
      data format.
      For example, SingleCellDataset loads an AnnData object from an h5ad
      file.

      The loaded data should be stored as instance attributes that can be
      accessed by other methods.


   .. py:method:: unload_data() -> None

      Unload the dataset from memory.

      This method should be implemented by subclasses to free memory by
      clearing loaded data.
      For example, SingleCellDataset sets its AnnData object to None.

      This is used to clear memory-intensive data before serialization,
      since serializing large raw data artifacts can be error-prone and
      inefficient.

      Any instance attributes containing loaded data should be cleared or
      set to None.


   .. py:property:: perturbation_truth
      :type: Dict[str, pandas.DataFrame]


   .. py:property:: condition_key
      :type: str


   .. py:property:: split_key
      :type: str