Model selection using a human hindbrain dataset

In this notebook, we use RegVelo’s ModelCaomparison class to select a model setup on the human hindbrain dataset from Braun, E. et al, 2023 that we previously preprocessed. For model evaluation, we use the cross-boundary correctness (CBC) based on known hindbrain trajectories.

A detailed tutorial on how to use RegVelo’s ModelComparison class is provided in the Model Comparison Tutorial.

Library import

import numpy as np
import scanpy as sc

import cellrank as cr
import scvi
import scvelo as scv

from regvelo import REGVELOVI
from regvelo import ModelComparison
import regvelo as rgv

General settings

scvi.settings.seed = 0
scv.settings.set_figure_params("scvelo", dpi=80, transparent=True, fontsize=14, color_map="viridis")
%matplotlib inline

Load data

In the following, we load the preprocessed embryonic hindbrain dataset from the previous tutorial notebook and define the ground truth transitions used for model evaluation.

adata = rgv.datasets.hindbrain(data_type = "preprocessed")
STATE_TRANSITIONS = [
    ("Hindbrain neuroblast", "Pontocerebellar glutamatergic"),
    ("Hindbrain neuroblast", "Dorsal interneurons"),
    ("Hindbrain neuroblast", "Dorsal sensory neuroblasts"),
    ("Hindbrain radial glia", "Visceral motor neuron"),
]

ModelComparison

Object initialization

We now initialize the ModelComparison object and sepcify the parameter state_transition.

comp = ModelComparison(adata = adata, state_transition=STATE_TRANSITIONS)

Train

We set batch_size to np.min(adata.shape[0], 5000) to avoid out of memory issue.

comp.train(model_list=['soft','hard','soft_regularized'],
           lam2=[1.0],
           n_repeat=1,
           batch_size=min(adata.shape[0], 5000))
Epoch 442/1500:  29%|██▉       | 442/1500 [40:39<1:37:18,  5.52s/it, v_num=1]
Monitored metric elbo_validation did not improve in the last 45 records. Best score: -2095.273. Signaling Trainer to stop.
Epoch 848/1500:  57%|█████▋    | 848/1500 [1:17:29<59:35,  5.48s/it, v_num=1]

Monitored metric elbo_validation did not improve in the last 45 records. Best score: -2043.481. Signaling Trainer to stop.
Epoch 685/1500:  46%|████▌     | 685/1500 [1:03:17<1:15:18,  5.54s/it, v_num=1]
Monitored metric elbo_validation did not improve in the last 45 records. Best score: -2030.300. Signaling Trainer to stop.
['soft_0', 'hard_0', 'soft_regularized\nlam2:1.0_0']

Evaluate and save

We use CBC to evaluate if the model’s inferred transitions align with the ground truth state changes and specify the parameter side_key, which contains cell-type annotations stored in adata.obs.

comp.evaluate(side_information='CBC',
              side_key='regvelo_annotation')
('df_CBC',
                         Model      Corr Run
 0                        soft  0.695765   0
 1                        hard  0.686511   0
 2  soft_regularized\nlam2:1.0  0.744029   0)
comp.plot_results(side_information='CBC')
vae_sr = comp.MODEL_TRAINED['soft_regularized\nlam2:1.0_0']
vae_sr.save('vae_sr')

Trajectory and latent time prediction

vae_sr = REGVELOVI.load('vae_sr', adata)
rgv.tl.set_output(adata, vae_sr, n_samples=30, batch_size=5000)
INFO     File vae_sr/model.pt already downloaded
scv.tl.velocity_graph(adata)
computing velocity graph (using 1/128 cores)
WARNING: Unable to create progress bar. Consider installing `tqdm` as `pip install tqdm` and `ipywidgets` as `pip install ipywidgets`,
or disable the progress bar using `show_progress_bar=False`.
    finished (0:01:18) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
scv.settings.set_figure_params("scvelo", dpi=80, transparent=True, fontsize=14, color_map="viridis")
scv.pl.velocity_embedding_stream(adata, basis="umap", title="", color = "CellClass")
adata.obs["latent_time"] = adata.layers["fit_t"].mean(1)
scv.pl.scatter(adata, basis="umap", title="",color = ["latent_time"],legend_loc = "on data")