Single-Omics alignment of mouse brain of H3K27ac+RNA with ATAC+RNAmouse brain

  • In this tutorial, we will demonstrate how to implement mouse brain of H3K27ac+RNA with ATAC+RNA alignment using 3d-OT and calculate the chamfer distance

  • We collected the data from from AtlasXplore Raw data.

Loading package

[1]:
from lib_3d_OT.utils import *
import scanpy as sc
import numpy as np
import pandas as pd
import torch
from lib_3d_OT.single_modialty import *
import torch.optim as optim
import warnings
warnings.filterwarnings("ignore")
R[write to console]:                    __           __
   ____ ___  _____/ /_  _______/ /_
  / __ `__ \/ ___/ / / / / ___/ __/
 / / / / / / /__/ / /_/ (__  ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/   version 6.1.1
Type 'citation("mclust")' for citing this R package in publications.

[2]:
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")

Loading and Pre-processing two slices

  • First, we need to prepare the single cell spatial data into AnnData objects. AnnData is the standard data class we use in 3d-OT.

  • See documentation for more details if you are unfamiliar, including how to construct AnnData objects from scratch, and how to read data in other formats (csv, mtx, loom, etc.) into AnnData objects.

  • dpca is the preprocessing process for reference SLAT

  • Use multi-omics clustering results as alignment label

[3]:
adata3=sc.read_h5ad('/home/dbj/mouse/spatialglue_alldata/Dataset9_Mouse_Brain_H3K27ac/3d-OT.h5ad')
adata4=sc.read_h5ad('/home/dbj/mouse/spatialglue_alldata/Dataset7_Mouse_Brain_ATAC/3d-OT.h5ad')
[4]:
adata1=sc.read_h5ad('/home/dbj/mouse/spatialglue_alldata/Dataset9_Mouse_Brain_H3K27ac/adata_RNA.h5ad')
adata2=sc.read_h5ad('/home/dbj/mouse/SpatialGlue/Mouse_Brain/adata_RNA.h5ad')
adata1=adata1[adata1.obs_names.isin(adata3.obs_names)]

adata1.obsm['spatial'] = adata1.obsm['spatial'][:, [1, 0]]
adata1.obsm['spatial'][:, 1] = -adata1.obsm['spatial'][:, 1]
adata1.obs['truth'] = adata3.obs['3d-OT'].astype(str) + '_2'
adata2=adata2[adata2.obs_names.isin(adata4.obs_names)]
adata2.obs['truth']=adata4.obs['3d-OT'].astype(str) + '_1'
[5]:
adatalist=[adata1,adata2]
adata1,adata2=dpca(adatalist,n_comps=50,join='inner')

Constructing neighbor graph and training the Pointnet++Encoder

We first build the neighbor graph of graph1 of ATAC+RNA slice and train the encoder to get a trained encoder best_model1

[6]:
set_seed(7)
graph1 = prepare_data(adata2, location="spatial", nb_neighbors=6).to(device)
input_dim1 = graph1.express.shape[-1]
model1 = extractMODEL(args=None,input_dim=input_dim1)
optimizer = optim.Adam(model1.parameters(), lr=0.001)
best_model1, min_loss = train_graph_extractor(graph1, model1, optimizer, device,epochs=800)
Epoch 800/800, Loss: 1.435392, Min Loss: 1.424627

Constructing neighbor graph and training the Pointnet++Encoder

We then build the neighbor graph of graph2 of H3K27ac+RNA slice and train the encoder to get a trained encoder best_model2

[7]:
set_seed(7)
graph2 = prepare_data(adata1, location="spatial", nb_neighbors=6).to(device)
input_dim2 = graph2.express.shape[-1]
model2 = extractMODEL(args=None, input_dim=input_dim2)
optimizer2 = optim.Adam(model2.parameters(), lr=0.001)
best_model2,min_loss = train_graph_extractor(graph2, model2,optimizer2, device, epochs=800)
Epoch 800/800, Loss: 1.412709, Min Loss: 1.406994

Training the optimal transport module

Enter graph1 and graph2 and the two encoders we trained into the optimal transport model

[8]:
pclouds_list=[graph1,graph2]
[9]:
input_dim1 = pclouds_list[0].express.shape[-1]
input_dim2 = pclouds_list[1].express.shape[-1]
model = UnifiedModel(input_dim1=input_dim1,input_dim2=input_dim2,simk=5,otk=300,reconk=2,best_encoder1=best_model1,best_encoder2=best_model2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
lr_lambda = lambda epoch: 1.0 if epoch < 340 else 1.0
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda)
args = {
    "backward_dist_weight":1.0,
    "use_smooth_flow":1,
    "smooth_flow_loss_weight":1.0,
    "use_div_flow":1,
    "div_flow_loss_weight":1.0,
    "div_neighbor": 8,
    "lattice_steps": 10,
    "nb_neigh_smooth_flow":32,
}



train(model=model,pcloud_list=pclouds_list,optimizer=optimizer,scheduler=scheduler,device=device,nb_epochs=1,use_corr_conf=False,use_smooth_flow=True,use_div_flow=True,args=args)
Time Pair 0,total_loss: 0.1051,smooth_flow_loss: 0.0744 Target Recon Loss: 0.00006760,Div Flow Loss: 0.0306

Target align slice truth

[10]:
sc.pl.embedding(adata1,basis='spatial',color='truth',size=50)
../_images/Mouse_brain_p22_alignment_mousebrain_align_16_0.png

source align slice truth

[11]:
sc.pl.embedding(adata2,basis='spatial',color='truth',size=50)
../_images/Mouse_brain_p22_alignment_mousebrain_align_18_0.png

Visualize and quantify the evaluation of seven region alignment results

  • selected_cell_type represents the drawn source cell type.

  • finaltruth means that the target cell type corresponding to the source cell type that based on the biological understanding, and it is used to obtain the spatial location information of the target cell type and calculate the chamfer distance.

  • all_arrow_ends represents all aligned flow end positions from source cell type,it is used to calculate the chamfer distance.

  • layer_1_pcloud_3D represents the target cell type spatial position information based on biological understanding, and is used to calculate the chamfer distance.

  • The 8_2 and 12_2 regions from slice H3K27ac+RNA sections are considered by us as alignment targets from the ATAC+RNA slice 6_1 region.

[12]:
from lib_3d_OT.plot import *
all_arrow_ends,layer_1_pcloud_3D=plot_selected_cell_type_flow(pclouds_list, model, device,selected_cell_type='6_1',finaltruth=['8_2','12_2'],xlim=(-0.1, 1.1),ylim=(-0.1, 1.1),height_scale=1,size=1,alpha=0.4,
    #save_path='/home/dbj/mouse/spatialglue_alldata/multialign/RNA-RNA'
)
Number of arrow ends: 1171
Layer 1 points count: 1135
../_images/Mouse_brain_p22_alignment_mousebrain_align_20_1.png

-Log10(chamfer_distance) as a performance metric for alignment

[13]:
chamfer_dist = chamfer_distance(all_arrow_ends,layer_1_pcloud_3D)

print(f"chamfer distance: {chamfer_dist}")
chamfer distance: 0.0003938377828449989
[14]:
from lib_3d_OT.plot import *
all_arrow_ends,layer_1_pcloud_3D=plot_selected_cell_type_flow(pclouds_list, model, device,selected_cell_type='10_1',finaltruth=['6_2'],xlim=(-0.1, 1.1),ylim=(-0.1, 1.1),height_scale=1,size=1,alpha=0.4,
    #save_path='/home/dbj/DPLFC/'
)
Number of arrow ends: 539
Layer 1 points count: 650
../_images/Mouse_brain_p22_alignment_mousebrain_align_23_1.png
[15]:
chamfer_dist = chamfer_distance(all_arrow_ends,layer_1_pcloud_3D)

print(f"chamfer distance: {chamfer_dist}")
chamfer distance: 0.000788768230685106
[16]:
from lib_3d_OT.plot import *
all_arrow_ends,layer_1_pcloud_3D=plot_selected_cell_type_flow(pclouds_list, model, device,selected_cell_type='2_1',finaltruth=['2_2'],xlim=(-0.1, 1.1),ylim=(-0.1, 1.1),height_scale=1,size=1,alpha=0.4,
    #save_path='/home/dbj/DPLFC/'
)
Number of arrow ends: 832
Layer 1 points count: 749
../_images/Mouse_brain_p22_alignment_mousebrain_align_25_1.png
[17]:
chamfer_dist = chamfer_distance(all_arrow_ends,layer_1_pcloud_3D)

print(f"chamfer distance: {chamfer_dist}")
chamfer distance: 0.0012000246409342786
[18]:
from lib_3d_OT.plot import *
all_arrow_ends,layer_1_pcloud_3D=plot_selected_cell_type_flow(pclouds_list, model, device,selected_cell_type='4_1',finaltruth=['4_2'],xlim=(-0.1, 1.1),ylim=(-0.1, 1.1),height_scale=1,size=1,alpha=0.4,
    #save_path='/home/dbj/DPLFC/'
)
Number of arrow ends: 535
Layer 1 points count: 764
../_images/Mouse_brain_p22_alignment_mousebrain_align_27_1.png
[19]:
chamfer_dist = chamfer_distance(all_arrow_ends,layer_1_pcloud_3D)

print(f"chamfer distance: {chamfer_dist}")
chamfer distance: 0.001000483049363654