czbenchmarks.tasks.label_prediction
Attributes
Classes
Pydantic model for MetadataLabelPredictionTask inputs. |
|
Output for label prediction task. |
|
Task for predicting labels from embeddings using cross-validation. |
Module Contents
- czbenchmarks.tasks.label_prediction.logger
- class czbenchmarks.tasks.label_prediction.MetadataLabelPredictionTaskInput(/, **data: Any)[source]
Bases:
czbenchmarks.tasks.task.TaskInput
Pydantic model for MetadataLabelPredictionTask inputs.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- labels: czbenchmarks.types.ListLike
- class czbenchmarks.tasks.label_prediction.MetadataLabelPredictionOutput(/, **data: Any)[source]
Bases:
czbenchmarks.tasks.task.TaskOutput
Output for label prediction task.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class czbenchmarks.tasks.label_prediction.MetadataLabelPredictionTask(*, random_seed: int = RANDOM_SEED)[source]
Bases:
czbenchmarks.tasks.task.Task
Task for predicting labels from embeddings using cross-validation.
Evaluates multiple classifiers (Logistic Regression, KNN) using k-fold cross-validation. Reports standard classification metrics.
- Parameters:
random_seed (int) – Random seed for reproducibility
- display_name = 'Label Prediction'
- description = 'Predict labels from embeddings using cross-validated classifiers and standard metrics.'
- input_model
- compute_baseline(expression_data: czbenchmarks.tasks.types.CellRepresentation, **kwargs) czbenchmarks.tasks.types.CellRepresentation [source]
Set a baseline cell representation using raw gene expression.
Instead of using embeddings from a model, this method uses the raw gene expression matrix as features for classification. This provides a baseline performance to compare against model-generated embeddings for classification tasks.
- Parameters:
expression_data – gene expression data or embedding
- Returns:
Baseline embedding