czbenchmarks.tasks.single_cell.perturbation_expression_prediction

Attributes

logger

Classes

`PerturbationExpressionPredictionTaskInput`	Pydantic model for Perturbation task inputs.
`PerturbationExpressionPredictionOutput`	Output for perturbation task.
`PerturbationExpressionPredictionTask`	Task for evaluating perturbation-induced expression predictions against

Functions

build_task_input_from_predictions(...)

Create a task input from a predictions AnnData and the dataset AnnData.

Module Contents

czbenchmarks.tasks.single_cell.perturbation_expression_prediction.logger

class czbenchmarks.tasks.single_cell.perturbation_expression_prediction.PerturbationExpressionPredictionTaskInput(/, **data: Any)[source]

Bases: czbenchmarks.tasks.task.TaskInput

Pydantic model for Perturbation task inputs.

Dataclass to contain input parameters for the PerturbationExpressionPredictionTask. The row and column ordering of the model predictions can optionallybe provided as cell_index and gene_index, respectively, so the task can align a model matrix that is a subset of or re-ordered relative to the dataset adata.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

adata: Annotated[anndata.AnnData, Field(description='AnnData object from SingleCellPerturbationDataset containing perturbation data and metadata.')]

pred_effect_operation: Annotated[Literal['difference', 'ratio'], Field(description="Method to compute predicted effect: 'difference' (mean(treated) - mean(control)) or 'ratio' (log ratio of means).")] = 'ratio'

gene_index: Annotated[pandas.Index | None, Field(description='Optional gene index for predictions to align model predictions with dataset genes.')] = None

cell_index: Annotated[pandas.Index | None, Field(description='Optional cell index for predictions to align model predictions with dataset cells.')] = None

czbenchmarks.tasks.single_cell.perturbation_expression_prediction.build_task_input_from_predictions(predictions_adata: anndata.AnnData, dataset_adata: anndata.AnnData, pred_effect_operation: Literal['difference', 'ratio'] = 'ratio') → PerturbationExpressionPredictionTaskInput[source]

Create a task input from a predictions AnnData and the dataset AnnData.

This preserves the predictions’ obs/var order so the task can align matrices without forcing the caller to reorder arrays.

Parameters:

predictions_adata (ad.AnnData) – The anndata containing model predictions.
dataset_adata (ad.AnnData) – The anndata object from SingleCellPerturbationDataset.
pred_effect_operation (Literal["difference", "ratio"]) – How to compute predicted effect between treated and control mean predictions over genes. “difference” uses mean(treated) - mean(control) and is generally safe across scales (probabilities, z-scores, raw expression). “ratio” uses log((mean(treated)+eps)/(mean(control)+eps)) when means are positive. Default is “ratio”.
gene_index (Optional[pd.Index]) – The index of the genes in the predictions AnnData.
cell_index (Optional[pd.Index]) – The index of the cells in the predictions AnnData.

class czbenchmarks.tasks.single_cell.perturbation_expression_prediction.PerturbationExpressionPredictionOutput(/, **data: Any)[source]

Bases: czbenchmarks.tasks.task.TaskOutput

Output for perturbation task.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

pred_mean_change_dict: Dict[str, numpy.ndarray]

true_mean_change_dict: Dict[str, numpy.ndarray]

class czbenchmarks.tasks.single_cell.perturbation_expression_prediction.PerturbationExpressionPredictionTask(*, random_seed: int = RANDOM_SEED)[source]

Bases: czbenchmarks.tasks.task.Task

Task for evaluating perturbation-induced expression predictions against their ground truth values. This is done by calculating metrics derived from predicted and ground truth log fold change values for each condition. Currently, Spearman rank correlation is supported.

The following arguments are required and must be supplied by the task input class (PerturbationExpressionPredictionTaskInput) when running the task. These parameters are described below for documentation purposes:

predictions_adata (ad.AnnData):
The anndata containing model predictions
dataset_adata (ad.AnnData):
The anndata object from SingleCellPerturbationDataset.
pred_effect_operation (Literal[“difference”, “ratio”]):
How to compute predicted effect between treated and control mean predictions over genes.
- “ratio” uses \(\log\left(\frac{\text{mean}(\text{treated}) + \varepsilon}{\text{mean}(\text{control}) + \varepsilon}\right)\) when means are positive.
- “difference” uses \(\text{mean}(\text{treated}) - \text{mean}(\text{control})\) and is generally safe across scales (probabilities, z-scores, raw expression).
Default is “ratio”.
gene_index (Optional[pd.Index]):
The index of the genes in the predictions AnnData.
cell_index (Optional[pd.Index]):
The index of the cells in the predictions AnnData.

display_name = 'Perturbation Expression Prediction'

description = 'Evaluate the quality of predicted changes in expression levels for genes that are...

input_model

baseline_model

condition_key = None

abstract compute_baseline(expression_data: czbenchmarks.tasks.types.CellRepresentation, baseline_input: czbenchmarks.tasks.task.NoBaselineInput = None)[source]

Set a baseline embedding for perturbation expression prediction.

Not implemented as this task evaluates expression matrices, not embeddings.