czbenchmarks.tasks.label_prediction

Attributes

logger

Classes

MetadataLabelPredictionTaskInput

Pydantic model for MetadataLabelPredictionTask inputs.

MetadataLabelPredictionOutput

Output for label prediction task.

MetadataLabelPredictionTask

Task for predicting labels from embeddings using cross-validation.

Module Contents

czbenchmarks.tasks.label_prediction.logger
class czbenchmarks.tasks.label_prediction.MetadataLabelPredictionTaskInput(/, **data: Any)[source]

Bases: czbenchmarks.tasks.task.TaskInput

Pydantic model for MetadataLabelPredictionTask inputs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

labels: czbenchmarks.types.ListLike
n_folds: int = 5
min_class_size: int = 10
class czbenchmarks.tasks.label_prediction.MetadataLabelPredictionOutput(/, **data: Any)[source]

Bases: czbenchmarks.tasks.task.TaskOutput

Output for label prediction task.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

results: List[Dict[str, Any]]
class czbenchmarks.tasks.label_prediction.MetadataLabelPredictionTask(*, random_seed: int = RANDOM_SEED)[source]

Bases: czbenchmarks.tasks.task.Task

Task for predicting labels from embeddings using cross-validation.

Evaluates multiple classifiers (Logistic Regression, KNN) using k-fold cross-validation. Reports standard classification metrics.

Parameters:

random_seed (int) – Random seed for reproducibility

display_name = 'Label Prediction'
description = 'Predict labels from embeddings using cross-validated classifiers and standard metrics.'
input_model
compute_baseline(expression_data: czbenchmarks.tasks.types.CellRepresentation, **kwargs) czbenchmarks.tasks.types.CellRepresentation[source]

Set a baseline cell representation using raw gene expression.

Instead of using embeddings from a model, this method uses the raw gene expression matrix as features for classification. This provides a baseline performance to compare against model-generated embeddings for classification tasks.

Parameters:

expression_data – gene expression data or embedding

Returns:

Baseline embedding