Preprocess data and add prior GRN information using a human hindbrain dataset¶
In this notebook, we demonstrate the preprocessing steps and adding prior gene regulatory network (GRN) information needed before running the RegVelo pipeline. The dataset used in this tutorial is a subset of the first-trimester developing human brain dataset collected in Braun, E. et al, 2023.
A detailed description of the preprocessing steps is provided in the RegVelo manuscript.
Library import¶
import scanpy as sc
import numpy as np
import pandas as pd
import scvelo as scv
import scvi
import regvelo as rgv
General settings¶
scvi.settings.seed = 0
scv.settings.set_figure_params("scvelo", dpi=80, transparent=True, fontsize=14, color_map="viridis")
%matplotlib inline
Load data¶
In the following, we load the embryonic hindbrain dataset, that has already been annotated (see RegVelo manuscript). We further load the GRN learned from the human embryonic hindbrain single-cell multi-ome dataset (see RegVelo manuscript).
adata = rgv.datasets.hindbrain(data_type = "original")
adata
AnnData object with n_obs × n_vars = 49469 × 30958
obs: 'background_fraction', 'cell_probability', 'cell_size', 'droplet_efficiency', 'assignment', 'scDblFinder_DropletType', 'scDblFinder_Score', 'scrublet_DropletType', 'Tissue', 'batch', 'Experiment', 'Type', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'log1p_total_counts_hb', 'pct_counts_hb', 'scrublet_score', 'scrublet_cluster_score_sample', 'scrublet_bh_pval_sample', 'background_fraction_cluster_score_sample', 'background_fraction_bh_pval_sample', 'paper_code', 'method', 'method2', 'FACS', 'stage', 'pcw_cont', 'bulk_name', '10X_run', 'n_genes', 'S_score', 'G2M_score', 'phase', 'leiden_res1', 'leiden_res1_R', 'Celltypist_DHB_predicted_labels', 'Celltypist_DHB_over_clustering', 'Celltypist_DHB_majority_voting', 'Celltypist_DHB_conf_score', 'Celltypist_GSE155121_full_predicted_labels', 'Celltypist_GSE155121_full_over_clustering', 'Celltypist_GSE155121_full_majority_voting', 'Celltypist_GSE155121_full_conf_score', 'Celltypist_GSE157329_developmental_system_full_predicted_labels', 'Celltypist_GSE157329_developmental_system_full_over_clustering', 'Celltypist_GSE157329_developmental_system_full_majority_voting', 'Celltypist_GSE157329_developmental_system_full_conf_score', 'Celltypist_GSE157329_annotation_full_predicted_labels', 'Celltypist_GSE157329_annotation_full_over_clustering', 'Celltypist_GSE157329_annotation_full_majority_voting', 'Celltypist_GSE157329_annotation_full_conf_score', 'Celltypist_GSE157329_final_annotation_full_predicted_labels', 'Celltypist_GSE157329_final_annotation_full_over_clustering', 'Celltypist_GSE157329_final_annotation_full_majority_voting', 'Celltypist_GSE157329_final_annotation_full_conf_score', 'STEMS_annotation_l1', '_scvi_batch', '_scvi_labels', 'leiden_SCVI', 'Celltypist_Immune_All_High_predicted_labels', 'Celltypist_Immune_All_High_over_clustering', 'Celltypist_Immune_All_High_majority_voting', 'Celltypist_Immune_All_High_conf_score', 'Celltypist_Immune_All_Low_predicted_labels', 'Celltypist_Immune_All_Low_over_clustering', 'Celltypist_Immune_All_Low_majority_voting', 'Celltypist_Immune_All_Low_conf_score', 'Teichmann_Celltype_fig1_full_predicted_labels', 'Teichmann_Celltype_fig1_full_over_clustering', 'Teichmann_Celltype_fig1_full_majority_voting', 'Teichmann_Celltype_fig1_full_conf_score', 'Teichmann_bone_full_predicted_labels', 'Teichmann_bone_full_over_clustering', 'Teichmann_bone_full_majority_voting', 'Teichmann_bone_full_conf_score', 'Teichmann_anatomical_site_full_predicted_labels', 'Teichmann_anatomical_site_full_over_clustering', 'Teichmann_anatomical_site_full_majority_voting', 'Teichmann_anatomical_site_full_conf_score', '_scvi_raw_norm_scaling', 'STEMS_annotation_l2', 'STEMS_annotation_l3', 'CellClass', 'CellCycleFraction', 'Clusters', 'Donor', 'DoubletFlag', 'DoubletScore', 'DropletClass', 'MitoFraction', 'NGenes', 'PrevClusters', 'Region', 'Sex', 'Subdivision', 'Subregion', 'TopLevelCluster', 'TotalUMIs', 'UnsplicedFraction', 'ValidCells', 'MB_Annotation_mb', 'MB_Clusters', 'MB_TopLevelCluster', 'vMB_Clusters', 'vMB_LRprediction_labels', 'CellClass_Subregion', 'MB_regvelo_annotation', 'vMB_regvelo_annotation', 'reference', 'STEMS_annotation_l2_SCANVI', 'STEMS_annotation_l2_prediction', 'MB_regvelo_annotation_SCANVI', 'MB_regvelo_annotation_prediction', 'vMB_regvelo_annotation_SCANVI', 'vMB_regvelo_annotation_prediction', 'CellClass_Subregion_SCANVI', 'CellClass_Subregion_prediction', 'leiden', 'batch_hvg', 'regvelo_annotation', 'regvelo_state'
var: 'gene_id'
uns: 'CellClass_Subregion_colors', 'Experiment_colors', 'MB_regvelo_annotation_colors', 'MB_regvelo_annotation_prediction_colors', 'STEMS_annotation_l2_colors', '_scvi_manager_uuid', '_scvi_uuid', 'batch_colors', 'hvg', 'leiden', 'leiden_SCVI', 'log1p', 'neighbors', 'pca', 'reference_colors', 'regvelo_annotation_colors', 'regvelo_state_colors', 'tsne', 'umap', 'vMB_regvelo_annotation_colors', 'vMB_regvelo_annotation_prediction_colors'
obsm: 'X_Embedding', 'X_Factors', 'X_mde_scanvi_CellClass_Subregion', 'X_mde_scanvi_MB_regvelo_annotation', 'X_mde_scanvi_STEMS_annotation_l2', 'X_mde_scanvi_vMB_regvelo_annotation', 'X_pca', 'X_scANVI_CellClass_Subregion', 'X_scANVI_MB_regvelo_annotation', 'X_scANVI_STEMS_annotation_l2', 'X_scANVI_vMB_regvelo_annotation', 'X_scVI', 'X_scVI_mde', 'X_tsne', 'X_umap', '_scvi_extra_categorical_covs', 'gene_expression_encoding'
layers: 'lognorm', 'spliced', 'unspliced'
obsp: 'connectivities', 'distances'
sc.pp.neighbors(adata, n_neighbors=30, n_pcs=50, use_rep="X_scVI")
sc.tl.umap(adata)
scv.pl.scatter(adata, basis="umap", title="", color=["regvelo_annotation", "CellClass"], legend_loc="on data")
eGRN = rgv.datasets.hindbrain_grn()
Create prior GRN for RegVelo¶
In the following, we preprocess the loaded GRN that will be needed as prior GRN for the RegVelo pipeline.
eGRN = eGRN.loc[:,["TF","Gene"]].copy()
reg = pd.crosstab(eGRN['TF'], eGRN['Gene'])
TF = np.unique(reg.index.tolist())
genes = np.unique(TF.tolist() + reg.columns.tolist())
GRN = pd.DataFrame(0, index=genes, columns=genes)
GRN.loc[TF,reg.columns.tolist()] = np.array(reg)
mask = (GRN.sum(0) != 0) | (GRN.sum(1) != 0)
GRN = GRN.loc[mask,mask].copy()
print("Done! processed GRN with " + str(reg.shape[0]) + " TFs and " + str(reg.shape[1]) + " targets")
Done! processed GRN with 151 TFs and 4219 targets
Preprocess data and align prior GRN for RegVelo pipeline¶
We perform preprocessing steps, consisting of filtering and normalization. We further compute the first and second order moments (means and uncentered variances) using scv.pp.moments needed for velocity estimation. Note that this step might be time-consuming.
Note
If preprocessing steps have already performed, you can skip this section and proceed directly to loading prior GRN.
scv.pp.filter_genes(adata, min_shared_counts=20)
scv.pp.normalize_per_cell(adata)
scv.pp.filter_genes_dispersion(adata, n_top_genes=3000)
scv.pp.moments(adata, n_pcs=None, n_neighbors=None)
Filtered out 19200 genes that are detected 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Extracted 3000 highly variable genes.
computing moments based on connectivities
finished (0:00:06) --> added
'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
adata
AnnData object with n_obs × n_vars = 49469 × 3000
obs: 'background_fraction', 'cell_probability', 'cell_size', 'droplet_efficiency', 'assignment', 'scDblFinder_DropletType', 'scDblFinder_Score', 'scrublet_DropletType', 'Tissue', 'batch', 'Experiment', 'Type', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'log1p_total_counts_hb', 'pct_counts_hb', 'scrublet_score', 'scrublet_cluster_score_sample', 'scrublet_bh_pval_sample', 'background_fraction_cluster_score_sample', 'background_fraction_bh_pval_sample', 'paper_code', 'method', 'method2', 'FACS', 'stage', 'pcw_cont', 'bulk_name', '10X_run', 'n_genes', 'S_score', 'G2M_score', 'phase', 'leiden_res1', 'leiden_res1_R', 'Celltypist_DHB_predicted_labels', 'Celltypist_DHB_over_clustering', 'Celltypist_DHB_majority_voting', 'Celltypist_DHB_conf_score', 'Celltypist_GSE155121_full_predicted_labels', 'Celltypist_GSE155121_full_over_clustering', 'Celltypist_GSE155121_full_majority_voting', 'Celltypist_GSE155121_full_conf_score', 'Celltypist_GSE157329_developmental_system_full_predicted_labels', 'Celltypist_GSE157329_developmental_system_full_over_clustering', 'Celltypist_GSE157329_developmental_system_full_majority_voting', 'Celltypist_GSE157329_developmental_system_full_conf_score', 'Celltypist_GSE157329_annotation_full_predicted_labels', 'Celltypist_GSE157329_annotation_full_over_clustering', 'Celltypist_GSE157329_annotation_full_majority_voting', 'Celltypist_GSE157329_annotation_full_conf_score', 'Celltypist_GSE157329_final_annotation_full_predicted_labels', 'Celltypist_GSE157329_final_annotation_full_over_clustering', 'Celltypist_GSE157329_final_annotation_full_majority_voting', 'Celltypist_GSE157329_final_annotation_full_conf_score', 'STEMS_annotation_l1', '_scvi_batch', '_scvi_labels', 'leiden_SCVI', 'Celltypist_Immune_All_High_predicted_labels', 'Celltypist_Immune_All_High_over_clustering', 'Celltypist_Immune_All_High_majority_voting', 'Celltypist_Immune_All_High_conf_score', 'Celltypist_Immune_All_Low_predicted_labels', 'Celltypist_Immune_All_Low_over_clustering', 'Celltypist_Immune_All_Low_majority_voting', 'Celltypist_Immune_All_Low_conf_score', 'Teichmann_Celltype_fig1_full_predicted_labels', 'Teichmann_Celltype_fig1_full_over_clustering', 'Teichmann_Celltype_fig1_full_majority_voting', 'Teichmann_Celltype_fig1_full_conf_score', 'Teichmann_bone_full_predicted_labels', 'Teichmann_bone_full_over_clustering', 'Teichmann_bone_full_majority_voting', 'Teichmann_bone_full_conf_score', 'Teichmann_anatomical_site_full_predicted_labels', 'Teichmann_anatomical_site_full_over_clustering', 'Teichmann_anatomical_site_full_majority_voting', 'Teichmann_anatomical_site_full_conf_score', '_scvi_raw_norm_scaling', 'STEMS_annotation_l2', 'STEMS_annotation_l3', 'CellClass', 'CellCycleFraction', 'Clusters', 'Donor', 'DoubletFlag', 'DoubletScore', 'DropletClass', 'MitoFraction', 'NGenes', 'PrevClusters', 'Region', 'Sex', 'Subdivision', 'Subregion', 'TopLevelCluster', 'TotalUMIs', 'UnsplicedFraction', 'ValidCells', 'MB_Annotation_mb', 'MB_Clusters', 'MB_TopLevelCluster', 'vMB_Clusters', 'vMB_LRprediction_labels', 'CellClass_Subregion', 'MB_regvelo_annotation', 'vMB_regvelo_annotation', 'reference', 'STEMS_annotation_l2_SCANVI', 'STEMS_annotation_l2_prediction', 'MB_regvelo_annotation_SCANVI', 'MB_regvelo_annotation_prediction', 'vMB_regvelo_annotation_SCANVI', 'vMB_regvelo_annotation_prediction', 'CellClass_Subregion_SCANVI', 'CellClass_Subregion_prediction', 'leiden', 'batch_hvg', 'regvelo_annotation', 'regvelo_state', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts'
var: 'gene_id', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
uns: 'CellClass_Subregion_colors', 'Experiment_colors', 'MB_regvelo_annotation_colors', 'MB_regvelo_annotation_prediction_colors', 'STEMS_annotation_l2_colors', '_scvi_manager_uuid', '_scvi_uuid', 'batch_colors', 'hvg', 'leiden', 'leiden_SCVI', 'log1p', 'neighbors', 'pca', 'reference_colors', 'regvelo_annotation_colors', 'regvelo_state_colors', 'tsne', 'umap', 'vMB_regvelo_annotation_colors', 'vMB_regvelo_annotation_prediction_colors', 'CellClass_colors'
obsm: 'X_Embedding', 'X_Factors', 'X_mde_scanvi_CellClass_Subregion', 'X_mde_scanvi_MB_regvelo_annotation', 'X_mde_scanvi_STEMS_annotation_l2', 'X_mde_scanvi_vMB_regvelo_annotation', 'X_pca', 'X_scANVI_CellClass_Subregion', 'X_scANVI_MB_regvelo_annotation', 'X_scANVI_STEMS_annotation_l2', 'X_scANVI_vMB_regvelo_annotation', 'X_scVI', 'X_scVI_mde', 'X_tsne', 'X_umap', '_scvi_extra_categorical_covs', 'gene_expression_encoding'
layers: 'lognorm', 'spliced', 'unspliced', 'Ms', 'Mu'
obsp: 'connectivities', 'distances'
Note
The function rgv.pp.set_prior_grn aligns the loaded GRN with the gene expression data in adata and by default, it removes genes without incoming or outgoing regulatory edges.
adata = rgv.pp.set_prior_grn(adata, GRN.T)
adata
AnnData object with n_obs × n_vars = 49469 × 1273
obs: 'background_fraction', 'cell_probability', 'cell_size', 'droplet_efficiency', 'assignment', 'scDblFinder_DropletType', 'scDblFinder_Score', 'scrublet_DropletType', 'Tissue', 'batch', 'Experiment', 'Type', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'log1p_total_counts_hb', 'pct_counts_hb', 'scrublet_score', 'scrublet_cluster_score_sample', 'scrublet_bh_pval_sample', 'background_fraction_cluster_score_sample', 'background_fraction_bh_pval_sample', 'paper_code', 'method', 'method2', 'FACS', 'stage', 'pcw_cont', 'bulk_name', '10X_run', 'n_genes', 'S_score', 'G2M_score', 'phase', 'leiden_res1', 'leiden_res1_R', 'Celltypist_DHB_predicted_labels', 'Celltypist_DHB_over_clustering', 'Celltypist_DHB_majority_voting', 'Celltypist_DHB_conf_score', 'Celltypist_GSE155121_full_predicted_labels', 'Celltypist_GSE155121_full_over_clustering', 'Celltypist_GSE155121_full_majority_voting', 'Celltypist_GSE155121_full_conf_score', 'Celltypist_GSE157329_developmental_system_full_predicted_labels', 'Celltypist_GSE157329_developmental_system_full_over_clustering', 'Celltypist_GSE157329_developmental_system_full_majority_voting', 'Celltypist_GSE157329_developmental_system_full_conf_score', 'Celltypist_GSE157329_annotation_full_predicted_labels', 'Celltypist_GSE157329_annotation_full_over_clustering', 'Celltypist_GSE157329_annotation_full_majority_voting', 'Celltypist_GSE157329_annotation_full_conf_score', 'Celltypist_GSE157329_final_annotation_full_predicted_labels', 'Celltypist_GSE157329_final_annotation_full_over_clustering', 'Celltypist_GSE157329_final_annotation_full_majority_voting', 'Celltypist_GSE157329_final_annotation_full_conf_score', 'STEMS_annotation_l1', '_scvi_batch', '_scvi_labels', 'leiden_SCVI', 'Celltypist_Immune_All_High_predicted_labels', 'Celltypist_Immune_All_High_over_clustering', 'Celltypist_Immune_All_High_majority_voting', 'Celltypist_Immune_All_High_conf_score', 'Celltypist_Immune_All_Low_predicted_labels', 'Celltypist_Immune_All_Low_over_clustering', 'Celltypist_Immune_All_Low_majority_voting', 'Celltypist_Immune_All_Low_conf_score', 'Teichmann_Celltype_fig1_full_predicted_labels', 'Teichmann_Celltype_fig1_full_over_clustering', 'Teichmann_Celltype_fig1_full_majority_voting', 'Teichmann_Celltype_fig1_full_conf_score', 'Teichmann_bone_full_predicted_labels', 'Teichmann_bone_full_over_clustering', 'Teichmann_bone_full_majority_voting', 'Teichmann_bone_full_conf_score', 'Teichmann_anatomical_site_full_predicted_labels', 'Teichmann_anatomical_site_full_over_clustering', 'Teichmann_anatomical_site_full_majority_voting', 'Teichmann_anatomical_site_full_conf_score', '_scvi_raw_norm_scaling', 'STEMS_annotation_l2', 'STEMS_annotation_l3', 'CellClass', 'CellCycleFraction', 'Clusters', 'Donor', 'DoubletFlag', 'DoubletScore', 'DropletClass', 'MitoFraction', 'NGenes', 'PrevClusters', 'Region', 'Sex', 'Subdivision', 'Subregion', 'TopLevelCluster', 'TotalUMIs', 'UnsplicedFraction', 'ValidCells', 'MB_Annotation_mb', 'MB_Clusters', 'MB_TopLevelCluster', 'vMB_Clusters', 'vMB_LRprediction_labels', 'CellClass_Subregion', 'MB_regvelo_annotation', 'vMB_regvelo_annotation', 'reference', 'STEMS_annotation_l2_SCANVI', 'STEMS_annotation_l2_prediction', 'MB_regvelo_annotation_SCANVI', 'MB_regvelo_annotation_prediction', 'vMB_regvelo_annotation_SCANVI', 'vMB_regvelo_annotation_prediction', 'CellClass_Subregion_SCANVI', 'CellClass_Subregion_prediction', 'leiden', 'batch_hvg', 'regvelo_annotation', 'regvelo_state', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts'
var: 'gene_id', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
uns: 'CellClass_Subregion_colors', 'Experiment_colors', 'MB_regvelo_annotation_colors', 'MB_regvelo_annotation_prediction_colors', 'STEMS_annotation_l2_colors', '_scvi_manager_uuid', '_scvi_uuid', 'batch_colors', 'hvg', 'leiden', 'leiden_SCVI', 'log1p', 'neighbors', 'pca', 'reference_colors', 'regvelo_annotation_colors', 'regvelo_state_colors', 'tsne', 'umap', 'vMB_regvelo_annotation_colors', 'vMB_regvelo_annotation_prediction_colors', 'CellClass_colors', 'regulators', 'targets', 'skeleton', 'network'
obsm: 'X_Embedding', 'X_Factors', 'X_mde_scanvi_CellClass_Subregion', 'X_mde_scanvi_MB_regvelo_annotation', 'X_mde_scanvi_STEMS_annotation_l2', 'X_mde_scanvi_vMB_regvelo_annotation', 'X_pca', 'X_scANVI_CellClass_Subregion', 'X_scANVI_MB_regvelo_annotation', 'X_scANVI_STEMS_annotation_l2', 'X_scANVI_vMB_regvelo_annotation', 'X_scVI', 'X_scVI_mde', 'X_tsne', 'X_umap', '_scvi_extra_categorical_covs', 'gene_expression_encoding'
layers: 'lognorm', 'spliced', 'unspliced', 'Ms', 'Mu'
obsp: 'connectivities', 'distances'
Note
The following steps ensure that only velocity-informative genes and TF genes are considered and updates adata.uns["skeleton"] accordingly. The selection of velocity-informative genes is done using rgv.pp.preprocess_data, which in addition to min-max scaling of the spliced and unspliced layers, filters genes with non-negative fitted degradation rates \(\gamma\) and non-negative \(R^2\) values from scv.tl.velocity with mode=deterministic. The function rgv.pp.filter_genes further refines the GRN, such that each gene has at least one regulator. This step further reduces the number of genes considered.
velocity_genes = rgv.pp.preprocess_data(adata.copy()).var_names.tolist()
# select TFs that regulate at least one gene
TF = adata.var_names[adata.uns["skeleton"].sum(1) != 0]
var_mask = np.union1d(TF, velocity_genes)
adata = adata[:, var_mask].copy()
adata = rgv.pp.filter_genes(adata)
adata = rgv.pp.preprocess_data(adata, filter_on_r2=False)
adata.var["velocity_genes"] = adata.var_names.isin(velocity_genes)
adata.var["TF"] = adata.var_names.isin(TF)
adata
computing velocities
finished (0:00:01) --> added
'velocity', velocity vectors for each individual cell (adata.layers)
Number of genes: 684
Number of genes: 628
Number of genes: 623
AnnData object with n_obs × n_vars = 49469 × 623
obs: 'background_fraction', 'cell_probability', 'cell_size', 'droplet_efficiency', 'assignment', 'scDblFinder_DropletType', 'scDblFinder_Score', 'scrublet_DropletType', 'Tissue', 'batch', 'Experiment', 'Type', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'log1p_total_counts_hb', 'pct_counts_hb', 'scrublet_score', 'scrublet_cluster_score_sample', 'scrublet_bh_pval_sample', 'background_fraction_cluster_score_sample', 'background_fraction_bh_pval_sample', 'paper_code', 'method', 'method2', 'FACS', 'stage', 'pcw_cont', 'bulk_name', '10X_run', 'n_genes', 'S_score', 'G2M_score', 'phase', 'leiden_res1', 'leiden_res1_R', 'Celltypist_DHB_predicted_labels', 'Celltypist_DHB_over_clustering', 'Celltypist_DHB_majority_voting', 'Celltypist_DHB_conf_score', 'Celltypist_GSE155121_full_predicted_labels', 'Celltypist_GSE155121_full_over_clustering', 'Celltypist_GSE155121_full_majority_voting', 'Celltypist_GSE155121_full_conf_score', 'Celltypist_GSE157329_developmental_system_full_predicted_labels', 'Celltypist_GSE157329_developmental_system_full_over_clustering', 'Celltypist_GSE157329_developmental_system_full_majority_voting', 'Celltypist_GSE157329_developmental_system_full_conf_score', 'Celltypist_GSE157329_annotation_full_predicted_labels', 'Celltypist_GSE157329_annotation_full_over_clustering', 'Celltypist_GSE157329_annotation_full_majority_voting', 'Celltypist_GSE157329_annotation_full_conf_score', 'Celltypist_GSE157329_final_annotation_full_predicted_labels', 'Celltypist_GSE157329_final_annotation_full_over_clustering', 'Celltypist_GSE157329_final_annotation_full_majority_voting', 'Celltypist_GSE157329_final_annotation_full_conf_score', 'STEMS_annotation_l1', '_scvi_batch', '_scvi_labels', 'leiden_SCVI', 'Celltypist_Immune_All_High_predicted_labels', 'Celltypist_Immune_All_High_over_clustering', 'Celltypist_Immune_All_High_majority_voting', 'Celltypist_Immune_All_High_conf_score', 'Celltypist_Immune_All_Low_predicted_labels', 'Celltypist_Immune_All_Low_over_clustering', 'Celltypist_Immune_All_Low_majority_voting', 'Celltypist_Immune_All_Low_conf_score', 'Teichmann_Celltype_fig1_full_predicted_labels', 'Teichmann_Celltype_fig1_full_over_clustering', 'Teichmann_Celltype_fig1_full_majority_voting', 'Teichmann_Celltype_fig1_full_conf_score', 'Teichmann_bone_full_predicted_labels', 'Teichmann_bone_full_over_clustering', 'Teichmann_bone_full_majority_voting', 'Teichmann_bone_full_conf_score', 'Teichmann_anatomical_site_full_predicted_labels', 'Teichmann_anatomical_site_full_over_clustering', 'Teichmann_anatomical_site_full_majority_voting', 'Teichmann_anatomical_site_full_conf_score', '_scvi_raw_norm_scaling', 'STEMS_annotation_l2', 'STEMS_annotation_l3', 'CellClass', 'CellCycleFraction', 'Clusters', 'Donor', 'DoubletFlag', 'DoubletScore', 'DropletClass', 'MitoFraction', 'NGenes', 'PrevClusters', 'Region', 'Sex', 'Subdivision', 'Subregion', 'TopLevelCluster', 'TotalUMIs', 'UnsplicedFraction', 'ValidCells', 'MB_Annotation_mb', 'MB_Clusters', 'MB_TopLevelCluster', 'vMB_Clusters', 'vMB_LRprediction_labels', 'CellClass_Subregion', 'MB_regvelo_annotation', 'vMB_regvelo_annotation', 'reference', 'STEMS_annotation_l2_SCANVI', 'STEMS_annotation_l2_prediction', 'MB_regvelo_annotation_SCANVI', 'MB_regvelo_annotation_prediction', 'vMB_regvelo_annotation_SCANVI', 'vMB_regvelo_annotation_prediction', 'CellClass_Subregion_SCANVI', 'CellClass_Subregion_prediction', 'leiden', 'batch_hvg', 'regvelo_annotation', 'regvelo_state', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts'
var: 'gene_id', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'highly_variable', 'velocity_genes', 'TF'
uns: 'CellClass_Subregion_colors', 'Experiment_colors', 'MB_regvelo_annotation_colors', 'MB_regvelo_annotation_prediction_colors', 'STEMS_annotation_l2_colors', '_scvi_manager_uuid', '_scvi_uuid', 'batch_colors', 'hvg', 'leiden', 'leiden_SCVI', 'log1p', 'neighbors', 'pca', 'reference_colors', 'regvelo_annotation_colors', 'regvelo_state_colors', 'tsne', 'umap', 'vMB_regvelo_annotation_colors', 'vMB_regvelo_annotation_prediction_colors', 'CellClass_colors', 'regulators', 'targets', 'skeleton', 'network'
obsm: 'X_Embedding', 'X_Factors', 'X_mde_scanvi_CellClass_Subregion', 'X_mde_scanvi_MB_regvelo_annotation', 'X_mde_scanvi_STEMS_annotation_l2', 'X_mde_scanvi_vMB_regvelo_annotation', 'X_pca', 'X_scANVI_CellClass_Subregion', 'X_scANVI_MB_regvelo_annotation', 'X_scANVI_STEMS_annotation_l2', 'X_scANVI_vMB_regvelo_annotation', 'X_scVI', 'X_scVI_mde', 'X_tsne', 'X_umap', '_scvi_extra_categorical_covs', 'gene_expression_encoding'
layers: 'lognorm', 'spliced', 'unspliced', 'Ms', 'Mu'
obsp: 'connectivities', 'distances'
The data is now preprocessed and we can proceed to comparing different RegVelo model setups in the next tutorial!
Note
The preprocessed data can also be directly accessed via rgv.datasets.hindbrain(data_type = "preprocessed").