Model selection using a human hindbrain dataset¶
In this notebook, we use RegVelo’s ModelCaomparison class to select a model setup on the human hindbrain dataset from Braun, E. et al, 2023 that we previously preprocessed. For model evaluation, we use the cross-boundary correctness (CBC) based on known hindbrain trajectories.
A detailed tutorial on how to use RegVelo’s ModelComparison class is provided in the Model Comparison Tutorial.
Library import¶
import numpy as np
import scanpy as sc
import cellrank as cr
import scvi
import scvelo as scv
from regvelo import REGVELOVI
from regvelo import ModelComparison
import regvelo as rgv
General settings¶
scvi.settings.seed = 0
scv.settings.set_figure_params("scvelo", dpi=80, transparent=True, fontsize=14, color_map="viridis")
%matplotlib inline
Load data¶
In the following, we load the preprocessed embryonic hindbrain dataset from the previous tutorial notebook and define the ground truth transitions used for model evaluation.
adata = rgv.datasets.hindbrain(data_type = "preprocessed")
STATE_TRANSITIONS = [
("Hindbrain neuroblast", "Pontocerebellar glutamatergic"),
("Hindbrain neuroblast", "Dorsal interneurons"),
("Hindbrain neuroblast", "Dorsal sensory neuroblasts"),
("Hindbrain radial glia", "Visceral motor neuron"),
]
ModelComparison¶
Object initialization¶
We now initialize the ModelComparison object and sepcify the parameter state_transition.
comp = ModelComparison(adata = adata, state_transition=STATE_TRANSITIONS)
Train¶
We set batch_size to np.min(adata.shape[0], 5000) to avoid out of memory issue.
comp.train(model_list=['soft','hard','soft_regularized'],
lam2=[1.0],
n_repeat=1,
batch_size=min(adata.shape[0], 5000))
Epoch 442/1500: 29%|██▉ | 442/1500 [40:39<1:37:18, 5.52s/it, v_num=1]
Monitored metric elbo_validation did not improve in the last 45 records. Best score: -2095.273. Signaling Trainer to stop.
Epoch 848/1500: 57%|█████▋ | 848/1500 [1:17:29<59:35, 5.48s/it, v_num=1]
Monitored metric elbo_validation did not improve in the last 45 records. Best score: -2043.481. Signaling Trainer to stop.
Epoch 685/1500: 46%|████▌ | 685/1500 [1:03:17<1:15:18, 5.54s/it, v_num=1]
Monitored metric elbo_validation did not improve in the last 45 records. Best score: -2030.300. Signaling Trainer to stop.
['soft_0', 'hard_0', 'soft_regularized\nlam2:1.0_0']
Evaluate and save¶
We use CBC to evaluate if the model’s inferred transitions align with the ground truth state changes and specify the parameter side_key, which contains cell-type annotations stored in adata.obs.
comp.evaluate(side_information='CBC',
side_key='regvelo_annotation')
('df_CBC',
Model Corr Run
0 soft 0.695765 0
1 hard 0.686511 0
2 soft_regularized\nlam2:1.0 0.744029 0)
vae_sr = comp.MODEL_TRAINED['soft_regularized\nlam2:1.0_0']
vae_sr.save('vae_sr')
Trajectory and latent time prediction¶
vae_sr = REGVELOVI.load('vae_sr', adata)
rgv.tl.set_output(adata, vae_sr, n_samples=30, batch_size=5000)
INFO File vae_sr/model.pt already downloaded
scv.tl.velocity_graph(adata)
computing velocity graph (using 1/128 cores)
WARNING: Unable to create progress bar. Consider installing `tqdm` as `pip install tqdm` and `ipywidgets` as `pip install ipywidgets`,
or disable the progress bar using `show_progress_bar=False`.
finished (0:01:18) --> added
'velocity_graph', sparse matrix with cosine correlations (adata.uns)
scv.settings.set_figure_params("scvelo", dpi=80, transparent=True, fontsize=14, color_map="viridis")