regvelo.REGVELOVI¶
- class regvelo.REGVELOVI(adata, W=None, regulators=None, soft_constraint=True, lam=1, lam2=0, **model_kwargs)[source]¶
Class implementing Regulatory Velocity Variational Inference (REGVELOVI).
This model extends the VAE framework to incorporate gene regulatory network (GRN) priors into RNA velocity modeling.
- Parameters:
adata (
AnnData) – Annotated data object that has been registered via setup_anndata().W (
Optional[Tensor]) – (tensor of shape [n_targets, n_regulators]), where rows indicate targets and columns indicate regulators.regulators (
Optional[list]) – List of transcription factors.soft_constraint (
bool) – Whether to use a soft constraint mode (as opposed to a hard constraint).lam (
float) – Regularization parameter controlling the strength of GRN prior incorporation.lam2 (
float) – Regularization parameter controlling the strength of L1 regularization on the Jacobian matrix.**model_kwargs – Additional keyword arguments passed to the
VELOVAEmodule.
Attributes table¶
Data attached to model instance. |
|
Manager instance associated with self.adata. |
|
The current device that the module's params are on. |
|
Returns computed metrics during training. |
|
Whether the model has been trained. |
|
Summary string of the model. |
|
Observations that are in test set. |
|
Observations that are in train set. |
|
Observations that are in validation set. |
Methods table¶
|
Adds RegVelo model outputs to the AnnData object. |
|
Computes a shared pseudotime trajectory across genes or cells. |
|
Converts a legacy saved model (<v0.15.0) to the updated save format. |
|
Deregisters the |
|
Retrieves the |
|
Computes directional uncertainty metrics for RNA velocity vectors. |
|
Compute the evidence lower bound (ELBO) on the data. |
|
Returns the object in AnnData associated with the key in the data registry. |
|
Compute the latent representation of the data. |
|
Returns the inferred latent time for each cell and gene. |
|
Compute the marginal log-likehood of the data. |
|
Computes permutation scores for gene dynamics across cell types. |
Returns the inferred kinetic parameters from the trained model. |
|
|
Compute the reconstruction error on the data. |
|
Returns velocity estimates for each gene in each cell. |
|
Instantiate a model from the saved output. |
|
Return the full registry saved with the model. |
|
Registers an |
|
Returns the model-fitted unspliced and spliced expression (u(t), s(t)). |
|
Save the state of the model. |
|
Sets up the AnnData object for use with REGVELOVI. |
|
Move model to device. |
|
Train the REGVELOVI model. |
|
Print summary of the setup for the initial AnnData or a given AnnData object. |
|
Print args used to setup a saved model. |
Attributes¶
Methods¶
- REGVELOVI.add_regvelo_outputs_to_adata(n_samples=30, adata=None, batch_size=None)[source]¶
Adds RegVelo model outputs to the AnnData object. This function computes latent time and velocity estimates and stores them in .layers of the AnnData object. It also applies a per-gene scaling of latent time to produce aligned fit values.
Adapted from the veloVI repository: https://github.com/YosefLab/velovi/
- Parameters:
n_samples (
int) – Number of posterior samples to draw for estimation.adata (
Optional[AnnData]) – Annotated data object with the same structure as the one used during model setup. If None, uses the registered AnnData.batch_size (
Optional[int]) – Mini-batch size for processing data. If None, uses the model’s default batch size in SCVI.
- Return type:
- Returns:
: A copy of the target-gene subset of the input AnnData with new layers:
'velocity','latent_time_regvelo','fit_t', and'fit_scaling'.
Computes a shared pseudotime trajectory across genes or cells.
- Parameters:
- Return type:
- Returns:
: The shared pseudotime vector across cells or genes, normalized if
norm=True.
- classmethod REGVELOVI.convert_legacy_save(dir_path, output_dir_path, overwrite=False, prefix=None, **save_kwargs)[source]¶
Converts a legacy saved model (<v0.15.0) to the updated save format.
- Parameters:
dir_path (
str) – Path to directory where legacy model is saved.output_dir_path (
str) – Path to save converted save files.overwrite (
bool) – Overwrite existing data or not. IfFalseand directory already exists atoutput_dir_path, error will be raised.**save_kwargs – Keyword arguments passed into
save().
- Return type:
- REGVELOVI.deregister_manager(adata=None)[source]¶
Deregisters the
AnnDataManagerinstance associated with adata.If adata is None, deregisters all
AnnDataManagerinstances in both the class and instance-specific manager stores, except for the one associated with this model instance.- Parameters:
adata (AnnData | None)
- REGVELOVI.get_anndata_manager(adata, required=False)[source]¶
Retrieves the
AnnDataManagerfor a given AnnData object.Requires
self.idhas been set. Checks for anAnnDataManagerspecific to this model instance.- Parameters:
- Return type:
- REGVELOVI.get_directional_uncertainty(adata=None, n_samples=50, gene_list=None, n_jobs=-1)[source]¶
Computes directional uncertainty metrics for RNA velocity vectors.
- Parameters:
adata (
Optional[AnnData]) – Annotated data object with the same structure as the one used during model setup. If None, uses the registered AnnData.n_samples (
int) – Number of posterior samples to draw for estimating directional uncertainty.gene_list (
Optional[Iterable[str]]) – List of genes to include in the analysis. If None, all genes are used.n_jobs (
int) – Number of parallel jobs to use for computation. If -1, uses all available cores.
- Return type:
- Returns:
:
- DataFrame containing directional variance, difference, and cosine similarity metrics
for each cell, indexed by cell names.
The second element is a NumPy array of cosine similarities.
- REGVELOVI.get_elbo(adata=None, indices=None, batch_size=None, dataloader=None, return_mean=True, **kwargs)[source]¶
Compute the evidence lower bound (ELBO) on the data.
The ELBO is the reconstruction error plus the Kullback-Leibler (KL) divergences between the variational distributions and the priors. It is different from the marginal log-likelihood; specifically, it is a lower bound on the marginal log-likelihood plus a term that is constant with respect to the variational distribution. It still gives good insights on the modeling of the data and is fast to compute.
- Parameters:
adata (
Optional[AnnData]) –AnnDataobject withvar_namesin the same order as the ones used to train the model. IfNoneanddataloaderis alsoNone, it defaults to the object used to initialize the model.indices (
Optional[Sequence[int]]) – Indices of observations inadatato use. IfNone, defaults to all observations. Ignored ifdataloaderis notNone.batch_size (
Optional[int]) – Minibatch size for the forward pass. IfNone, defaults toscvi.settings.batch_size. Ignored ifdataloaderis notNone.dataloader (
Optional[Iterator[dict[str,Tensor|None]]]) – An iterator over minibatches of data on which to compute the metric. The minibatches should be formatted as a dictionary ofTensorwith keys as expected by the model. IfNone, a dataloader is created fromadata.return_mean (
bool) – Whether to return the mean of the ELBO or the ELBO for each observation.**kwargs – Additional keyword arguments to pass into the forward method of the module.
- Return type:
- Returns:
: Evidence lower bound (ELBO) of the data.
Notes
This is not the negative ELBO, so higher is better.
- REGVELOVI.get_from_registry(adata, registry_key)[source]¶
Returns the object in AnnData associated with the key in the data registry.
AnnData object should be registered with the model prior to calling this function via the
self._validate_anndatamethod.
- REGVELOVI.get_latent_representation(adata=None, indices=None, give_mean=True, mc_samples=5000, batch_size=None, return_dist=False, dataloader=None)[source]¶
Compute the latent representation of the data.
This is typically denoted as \(z_n\).
- Parameters:
adata (
Optional[AnnData]) –AnnDataobject withvar_namesin the same order as the ones used to train the model. IfNoneanddataloaderis alsoNone, it defaults to the object used to initialize the model.indices (
Optional[Sequence[int]]) – Indices of observations inadatato use. IfNone, defaults to all observations. Ignored ifdataloaderis notNonegive_mean (
bool) – IfTrue, returns the mean of the latent distribution. IfFalse, returns an estimate of the mean usingmc_samplesMonte Carlo samples.mc_samples (
int) – Number of Monte Carlo samples to use for the estimator for distributions with no closed-form mean (e.g., the logistic normal distribution). Not used ifgive_meanisTrueor ifreturn_distisTrue.batch_size (
Optional[int]) – Minibatch size for the forward pass. IfNone, defaults toscvi.settings.batch_size. Ignored ifdataloaderis notNonereturn_dist (
bool) – IfTrue, returns the mean and variance of the latent distribution. Otherwise, returns the mean of the latent distribution.dataloader (
Optional[Iterator[dict[str,Tensor|None]]]) – An iterator over minibatches of data on which to compute the metric. The minibatches should be formatted as a dictionary ofTensorwith keys as expected by the model. IfNone, a dataloader is created fromadata.
- Return type:
ndarray[Any,dtype[TypeVar(_ScalarType_co, bound=generic, covariant=True)]] |tuple[ndarray[Any,dtype[TypeVar(_ScalarType_co, bound=generic, covariant=True)]],ndarray[Any,dtype[TypeVar(_ScalarType_co, bound=generic, covariant=True)]]]- Returns:
: An array of shape
(n_obs, n_latent)ifreturn_distisFalse. Otherwise, returns a tuple of arrays(n_obs, n_latent)with the mean and variance of the latent distribution.
- REGVELOVI.get_latent_time(adata=None, indices=None, gene_list=None, n_samples=1, n_samples_overall=None, batch_size=None, return_mean=True, return_numpy=None)[source]¶
Returns the inferred latent time for each cell and gene.
This function samples from the posterior distribution of the model to estimate latent transcriptional time for each gene in each cell. It supports subsampling, batching, and output customization.
Adapted from the veloVI repository: https://github.com/YosefLab/velovi/
- Parameters:
adata (
Optional[AnnData]) – Annotated data object with the same structure as the one used during model setup. If None, uses the registered AnnData.indices (
Optional[Sequence[int]]) – List of cell indices to include. If None, all cells are used.gene_list (
Optional[Sequence[str]]) – List of genes to include in the output. If None, all genes are used.n_samples (
int) – Number of posterior samples to draw per cell.n_samples_overall (
Optional[int]) – Total number of cells to subsample. If set, n_samples is forced to 1.batch_size (
Optional[int]) – Mini-batch size for processing data. If None, uses default batch size in SCVI.return_mean (
bool) – If True, returns the mean over samples. If False, returns the full sample tensor.return_numpy (
Optional[bool]) – If True, returns a NumPy array. If False or None, returns a DataFrame with gene names as columns and cell names as rows.
- Return type:
ndarray|DataFrame- Returns:
: If n_samples > 1 and return_mean is False, returns an array of shape (samples, cells, genes). Otherwise, returns (cells, genes), as either a NumPy array or DataFrame depending on return_numpy.
- REGVELOVI.get_marginal_ll(adata=None, indices=None, n_mc_samples=1000, batch_size=None, return_mean=True, dataloader=None, **kwargs)[source]¶
Compute the marginal log-likehood of the data.
The computation here is a biased estimator of the marginal log-likelihood of the data.
- Parameters:
adata (
Optional[AnnData]) –AnnDataobject withvar_namesin the same order as the ones used to train the model. IfNoneanddataloaderis alsoNone, it defaults to the object used to initialize the model.indices (
Optional[Sequence[int]]) – Indices of observations inadatato use. IfNone, defaults to all observations. Ignored ifdataloaderis notNone.n_mc_samples (
int) – Number of Monte Carlo samples to use for the estimator. Passed into the module’smarginal_llmethod.batch_size (
Optional[int]) – Minibatch size for the forward pass. IfNone, defaults toscvi.settings.batch_size. Ignored ifdataloaderis notNone.return_mean (
bool) – Whether to return the mean of the marginal log-likelihood or the marginal-log likelihood for each observation.dataloader (
Optional[Iterator[dict[str,Tensor|None]]]) – An iterator over minibatches of data on which to compute the metric. The minibatches should be formatted as a dictionary ofTensorwith keys as expected by the model. IfNone, a dataloader is created fromadata.**kwargs – Additional keyword arguments to pass into the module’s
marginal_llmethod.
- Return type:
- Returns:
: If
True, returns the mean marginal log-likelihood. Otherwise returns a tensor of shape(n_obs,)with the marginal log-likelihood for each observation.
Notes
This is not the negative log-likelihood, so higher is better.
- REGVELOVI.get_permutation_scores(labels_key, adata=None)[source]¶
Computes permutation scores for gene dynamics across cell types.
- Parameters:
- Return type:
- Returns:
:
DataFrame of permutation scores for each gene and cell type.
A permuted AnnData object used in the scoring procedure.
- REGVELOVI.get_rates()[source]¶
Returns the inferred kinetic parameters from the trained model.
This method extracts per-gene parameters from the trained decoder:
beta (transcription rate)
gamma (degradation rate)
alpha_1 (initial transcriptional activation)
- REGVELOVI.get_reconstruction_error(adata=None, indices=None, batch_size=None, dataloader=None, return_mean=True, **kwargs)[source]¶
Compute the reconstruction error on the data.
The reconstruction error is the negative log likelihood of the data given the latent variables. It is different from the marginal log-likelihood, but still gives good insights on the modeling of the data and is fast to compute. This is typically written as \(p(x \mid z)\), the likelihood term given one posterior sample.
- Parameters:
adata (
Optional[AnnData]) –AnnDataobject withvar_namesin the same order as the ones used to train the model. IfNoneanddataloaderis alsoNone, it defaults to the object used to initialize the model.indices (
Optional[Sequence[int]]) – Indices of observations inadatato use. IfNone, defaults to all observations. Ignored ifdataloaderis notNonebatch_size (
Optional[int]) – Minibatch size for the forward pass. IfNone, defaults toscvi.settings.batch_size. Ignored ifdataloaderis notNonedataloader (
Optional[Iterator[dict[str,Tensor|None]]]) – An iterator over minibatches of data on which to compute the metric. The minibatches should be formatted as a dictionary ofTensorwith keys as expected by the model. IfNone, a dataloader is created fromadata.return_mean (
bool) – Whether to return the mean reconstruction loss or the reconstruction loss for each observation.**kwargs – Additional keyword arguments to pass into the forward method of the module.
- Return type:
- Returns:
: Reconstruction error for the data.
Notes
This is not the negative reconstruction error, so higher is better.
- REGVELOVI.get_velocity(adata=None, indices=None, gene_list=None, n_samples=1, n_samples_overall=None, batch_size=None, return_mean=True, return_numpy=None, clip=True)[source]¶
Returns velocity estimates for each gene in each cell.
This function samples from the posterior and computes the expected RNA velocity as a function of unspliced and spliced abundances. Supports subsampling, batching, and output control.
Adapted from the veloVI repository: https://github.com/YosefLab/velovi/
- Parameters:
adata (
Optional[AnnData]) – Annotated data object with the same structure as the one used during model setup. If None, uses the registered AnnData.indices (
Optional[Sequence[int]]) – List of cell indices to include. If None, all cells are used.gene_list (
Optional[Sequence[str]]) – List of genes to include in the output. If None, all genes are used.n_samples (
int) – Number of posterior samples to draw per cell.n_samples_overall (
Optional[int]) – Total number of cells to subsample. If set, n_samples is forced to 1.batch_size (
Optional[int]) – Mini-batch size for processing data. If None, uses default batch size in SCVI.return_mean (
bool) – If True, returns the mean over samples. If False, returns the full sample tensor.return_numpy (
Optional[bool]) – If True, returns a NumPy array. If False or None, returns a DataFrame with gene names as columns and cell names as rows.clip (
bool) – Whether to clip velocities to avoid negative spliced abundances.
- Return type:
ndarray|DataFrame- Returns:
: If n_samples > 1 and return_mean is False, returns an array of shape (samples, cells, genes). Otherwise, returns (cells, genes), as either a NumPy array or DataFrame depending on return_numpy.
- classmethod REGVELOVI.load(dir_path, adata=None, accelerator='auto', device='auto', prefix=None, backup_url=None)[source]¶
Instantiate a model from the saved output.
- Parameters:
dir_path (
str) – Path to saved outputs.adata (
Union[AnnData,MuData,None]) – AnnData organized in the same way as data used to train model. It is not necessary to run setup_anndata, as AnnData is validated against the saved scvi setup dictionary. If None, will check for and load anndata saved with the model.accelerator (
str) – Supports passing different accelerator types (“cpu”, “gpu”, “tpu”, “ipu”, “hpu”, “mps, “auto”) as well as custom accelerator instances.device (
int|str) – The device to use. Can be set to a non-negative index (int or str) or “auto” for automatic selection based on the chosen accelerator. If set to “auto” and accelerator is not determined to be “cpu”, then device will be set to the first available device.backup_url (
Optional[str]) – URL to retrieve saved outputs from if not present on disk.
- Returns:
: Model with loaded state dictionaries.
Examples
>>> model = ModelClass.load(save_path, adata) >>> model.get_....
- static REGVELOVI.load_registry(dir_path, prefix=None)[source]¶
Return the full registry saved with the model.
- classmethod REGVELOVI.register_manager(adata_manager)[source]¶
Registers an
AnnDataManagerinstance with this model class.Stores the
AnnDataManagerreference in a class-specific manager store. Intended for use in thesetup_anndata()class method followed up by retrieval of theAnnDataManagervia the_get_most_recent_anndata_manager()method in the model init method.Notes
Subsequent calls to this method with an
AnnDataManagerinstance referring to the same underlying AnnData object will overwrite the reference to previousAnnDataManager.- Parameters:
adata_manager (AnnDataManager)
- REGVELOVI.rgv_expression_fit(adata=None, indices=None, gene_list=None, n_samples=1, batch_size=None, return_mean=True, return_numpy=None)[source]¶
Returns the model-fitted unspliced and spliced expression (u(t), s(t)).
This function estimates the predicted unspliced and spliced abundances for each gene in each cell by sampling from the posterior.
- Parameters:
adata (
Optional[AnnData]) – Annotated data object with the same structure as the one used during model setup. If None, uses the registered AnnData.indices (
Optional[Sequence[int]]) – List of cell indices to include. If None, all cells are used.gene_list (
Optional[Sequence[str]]) – List of genes to include in the output. If None, all genes are used.n_samples (
int) – Number of posterior samples to draw per cell.batch_size (
Optional[int]) – Mini-batch size for processing data. If None, uses default batch size in SCVI.return_mean (
bool) – If True, returns the mean over samples. If False, returns the full sample tensor.return_numpy (
Optional[bool]) – If True, returns NumPy arrays. If False or None, returns DataFrames with gene names as columns and cell names as rows.
- Return type:
- Returns:
: A tuple containing model-fitted spliced and unspliced abundances. If n_samples > 1 and return_mean is False, arrays are of shape (samples, cells, genes). Otherwise, shape is (cells, genes). Return type depends on return_numpy.
- REGVELOVI.save(dir_path, prefix=None, overwrite=False, save_anndata=False, save_kwargs=None, legacy_mudata_format=False, **anndata_write_kwargs)[source]¶
Save the state of the model.
Neither the trainer optimizer state nor the trainer history are saved. Model files are not expected to be reproducibly saved and loaded across versions until we reach version 1.0.
- Parameters:
dir_path (
str) – Path to a directory.prefix (
Optional[str]) – Prefix to prepend to saved file names.overwrite (
bool) – Overwrite existing data or not. If False and directory already exists at dir_path, error will be raised.save_anndata (
bool) – If True, also saves the anndatasave_kwargs (
Optional[dict]) – Keyword arguments passed intosave().legacy_mudata_format (
bool) – IfTrue, saves the modelvar_namesin the legacy format if the model was trained with aMuDataobject. The legacy format is a flat array with variable names across all modalities concatenated, while the new format is a dictionary with keys corresponding to the modality names and values corresponding to the variable names for each modality.anndata_write_kwargs – Kwargs for
write()
- classmethod REGVELOVI.setup_anndata(adata, spliced_layer=None, unspliced_layer=None, **kwargs)[source]¶
Sets up the AnnData object for use with REGVELOVI.
This method registers the necessary layers in the AnnData object for use in training and inference.
- Parameters:
adata (
AnnData) – Annotated data object with spliced and unspliced layers.spliced_layer (
Optional[str]) – Name of the layer in AnnData object that contains spliced normalized expression.unspliced_layer (
Optional[str]) – Name of the layer in AnnData object that contains unspliced normalized expression.**kwargs – Additional keyword arguments passed to the AnnDataManager.
- Return type:
- REGVELOVI.to_device(device)[source]¶
Move model to device.
- Parameters:
device (
str|int) – Device to move model to. Options: ‘cpu’ for CPU, integer GPU index (eg. 0), or ‘cuda:X’ where X is the GPU index (eg. ‘cuda:0’). See torch.device for more info.
Examples
>>> adata = scvi.data.synthetic_iid() >>> model = scvi.model.SCVI(adata) >>> model.to_device("cpu") # moves model to CPU >>> model.to_device("cuda:0") # moves model to GPU 0 >>> model.to_device(0) # also moves model to GPU 0
- REGVELOVI.train(max_epochs=1500, lr=0.01, weight_decay=1e-05, eps=1e-16, train_size=0.9, batch_size=None, validation_size=None, early_stopping=True, gradient_clip_val=10, plan_kwargs=None, optimizer='AdamW', **trainer_kwargs)[source]¶
Train the REGVELOVI model. This method uses a modified SCVI TrainingPlan and TrainRunner to optimize model parameters using the registered AnnData object. It supports early stopping, gradient clipping, and custom optimizer settings.
Adapted from the training routine of the veloVI repository: https://github.com/YosefLab/velovi/.
- Parameters:
max_epochs (
int) – Maximum number of training epochs.lr (
float) – Learning rate for the optimizer.weight_decay (
float) – Weight decay coefficient for regularization.eps (
float) – Epsilon value for numerical stability in the optimizer.train_size (
float) – Fraction of cells to use for training. Must be between 0 and 1.batch_size (
Optional[int]) – Mini-batch size used during training. If None, defaults to the full dataset.validation_size (
Optional[float]) – Fraction of cells to use for validation. If None, defaults to 1 - train_size. If train_size + validation_size < 1.0, the remainder is used as a test set.early_stopping (
bool) – Whether to perform early stopping based on validation loss.gradient_clip_val (
float) – Maximum allowed gradient value to clip gradients during backpropagation.plan_kwargs (
Optional[dict]) – Additional keyword arguments passed to the ModifiedTrainingPlan.optimizer (
str) – Optimizer to use for training.**trainer_kwargs – Additional keyword arguments passed to the SCVI TrainRunner.
- REGVELOVI.view_anndata_setup(adata=None, hide_state_registries=False)[source]¶
Print summary of the setup for the initial AnnData or a given AnnData object.
- Parameters:
adata (
Union[AnnData,MuData,None]) – AnnData object setup withsetup_anndataortransfer_fields().hide_state_registries (
bool) – If True, prints a shortened summary without details of each state registry.
- Return type: