czbenchmarks.datasets.utils
Functions
|
Load, download (if needed), and instantiate a dataset using Hydra configuration. |
|
Return a sorted list of all dataset names defined in the datasets.yaml Hydra configuration. |
|
Instantiate a dataset directly from arguments without requiring a YAML file. |
Module Contents
- czbenchmarks.datasets.utils.load_dataset(dataset_name: str, config_path: str | None = None) czbenchmarks.datasets.dataset.Dataset [source]
Load, download (if needed), and instantiate a dataset using Hydra configuration.
- Parameters:
- Returns:
Instantiated dataset object with data loaded.
- Return type:
- Raises:
FileNotFoundError – If the custom config file does not exist.
ValueError – If the specified dataset is not found in the configuration.
Notes
Merges custom config with default config if provided.
Downloads dataset file if a remote path is specified using download_file_from_remote.
Uses Hydra for instantiation and configuration management.
The returned dataset object is an instance of the Dataset class or its subclass.
- czbenchmarks.datasets.utils.list_available_datasets() Dict[str, Dict[str, str]] [source]
Return a sorted list of all dataset names defined in the datasets.yaml Hydra configuration.
- Returns:
Alphabetically sorted list of available dataset names.
- Return type:
List[str]
Notes
Loads configuration using Hydra.
Extracts dataset names from the datasets section of the configuration.
Sorts the dataset names alphabetically for easier readability.
- czbenchmarks.datasets.utils.load_local_dataset(dataset_class: str, organism: czbenchmarks.datasets.types.Organism, path: str | pathlib.Path, **kwargs) czbenchmarks.datasets.dataset.Dataset [source]
Instantiate a dataset directly from arguments without requiring a YAML file.
This function is completely independent from load_dataset() and directly instantiates the dataset class without using OmegaConf objects.
- Parameters:
target – The full import path to the Dataset class to instantiate.
organism – The organism of the dataset.
path – The local or remote path to the dataset file.
**kwargs – Additional key-value pairs for the dataset config.
- Returns:
Instantiated dataset object with data loaded.
Example
- dataset = load_local_dataset(
target=”czbenchmarks.datasets.SingleCellLabeledDataset”, organism=Organism.HUMAN, path=”example-small.h5ad”,
)