cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder
- class cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder(experiment: Experiment, measurement_name: str = 'RNA', layer_name: str = 'raw', *, block_size: int | None = None, **kwargs: Any)
Abstract base class for methods to process CELLxGENE Census ExperimentAxisQuery results into a Hugging Face Dataset in which each item represents one cell. Subclasses implement the cell_item() method to process each row of an X layer into a Dataset item, and may also override __init__() and context __enter__() to perform any necessary preprocessing.
The base class inherits ExperimentAxisQuery, so typical usage would be:
``` import cellxgene_census import tiledbsoma from cellxgene_census.experimental.ml import GeneformerTokenizer
- with cellxgene_census.open_soma() as census:
- with SubclassOfCellDatasetBuilder(
census[“census_data”][“homo_sapiens”], obs_query=tilebsoma.AxisQuery(…), # define some subset of Census cells … # other ExperimentAxisQuery parameters e.g. var_query
- ) as builder:
dataset = builder.build()
- __init__(experiment: Experiment, measurement_name: str = 'RNA', layer_name: str = 'raw', *, block_size: int | None = None, **kwargs: Any)
Initialize the CellDatasetBuilder to process the results of a Census ExperimentAxisQuery.
experiment: Census Experiment to be queried.
measurement_name: Measurement in the experiment, default “RNA”.
layer_name: Name of the X layer to process, default “raw”.
- block_size: Number of cells to process in-memory at once. If unspecified,
tiledbsoma.SparseNDArrayRead.blockwise() will select a default.
- kwargs: passed through to ExperimentAxisQuery(), especially obs_query
and var_query.
Methods
X
(layer_name, *[, batch_size, partitions, ...])Returns an
X
layer as a sparse read.__init__
(experiment[, measurement_name, ...])Initialize the CellDatasetBuilder to process the results of a Census ExperimentAxisQuery.
build
([from_generator_kwargs])Build the dataset from query results.
cell_item
(cell_joinid, Xrow)Abstract method to process the X row for one cell into a Dataset item.
close
()Releases resources associated with this query.
obs
(*[, column_names, batch_size, ...])Returns
obs
as an Arrow table iterator.obs_joinids
()Returns
obs
soma_joinids
as an Arrow array.obsm
(layer)Returns an
obsm
layer as a sparse read.obsp
(layer)Returns an
obsp
layer as a sparse read.to_anndata
(X_name, *[, column_names, ...])Executes the query and return result as an
AnnData
in-memory object.var
(*[, column_names, batch_size, ...])Returns
var
as an Arrow table iterator.var_joinids
()Returns
var
soma_joinids
as an Arrow array.varm
(layer)Returns a
varm
layer as a sparse read.varp
(layer)Returns a
varp
layer as a sparse read.Attributes
indexer
A
soma_joinid
indexer for bothobs
andvar
axes.n_obs
The number of
obs
axis query results.n_vars
The number of
var
axis query results.