cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder
- class cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder(experiment: Experiment, measurement_name: str = 'RNA', layer_name: str = 'raw', *, block_size: int | None = None, **kwargs: Any)
Abstract base class for methods to process CELLxGENE Census ExperimentAxisQuery results into a Hugging Face Dataset in which each item represents one cell. Subclasses implement the cell_item() method to process each row of an X layer into a Dataset item, and may also override __init__() and context __enter__() to perform any necessary preprocessing.
DEPRECATION NOTICE: this is planned for removal from the cellxgene_census API and migrated into git:cellxgene-census/tools/models/geneformer.
The base class inherits ExperimentAxisQuery, so typical usage would be:
``` import cellxgene_census import tiledbsoma from cellxgene_census.experimental.ml import GeneformerTokenizer
- with cellxgene_census.open_soma() as census:
- with SubclassOfCellDatasetBuilder(
census[“census_data”][“homo_sapiens”], obs_query=tilebsoma.AxisQuery(…), # define some subset of Census cells … # other ExperimentAxisQuery parameters e.g. var_query
- ) as builder:
dataset = builder.build()
- __init__(experiment: Experiment, measurement_name: str = 'RNA', layer_name: str = 'raw', *, block_size: int | None = None, **kwargs: Any)
Initialize the CellDatasetBuilder to process the results of a Census ExperimentAxisQuery.
experiment: Census Experiment to be queried.
measurement_name: Measurement in the experiment, default “RNA”.
layer_name: Name of the X layer to process, default “raw”.
- block_size: Number of cells to process in-memory at once. If unspecified,
tiledbsoma.SparseNDArrayRead.blockwise() will select a default.
- kwargs: passed through to ExperimentAxisQuery(), especially obs_query
and var_query.
Methods
X(layer_name, *[, batch_size, partitions, ...])Returns an
Xlayer as a sparse read.__init__(experiment[, measurement_name, ...])Initialize the CellDatasetBuilder to process the results of a Census ExperimentAxisQuery.
build([from_generator_kwargs])Build the dataset from query results.
cell_item(cell_joinid, Xrow)Abstract method to process the X row for one cell into a Dataset item.
close()Releases resources associated with this query.
obs(*[, column_names, batch_size, ...])Returns
obsas an Arrow table iterator.obs_joinids()Returns
obssoma_joinidsas an Arrow array.obs_scene_ids()Returns a pyarrow array with scene ids that contain obs from this query.
obsm(layer)Returns an
obsmlayer as a sparse read.obsp(layer)Returns an
obsplayer as a sparse read.to_anndata(X_name, *[, column_names, ...])Exports the query to an in-memory
AnnDataobject.to_spatialdata(X_name, *[, column_names, ...])Returns a SpatialData object containing the query results
var(*[, column_names, batch_size, ...])Returns
varas an Arrow table iterator.var_joinids()Returns
varsoma_joinidsas an Arrow array.var_scene_ids()Return a pyarrow array with scene ids that contain var from this query.
varm(layer)Returns a
varmlayer as a sparse read.varp(layer)Returns a
varplayer as a sparse read.Attributes
indexerA
soma_joinidindexer for bothobsandvaraxes.n_obsThe number of
obsaxis query results.n_varsThe number of
varaxis query results.