Spatial multi-Omics Clustering Demonstration of Human_Lymph_Node A1
In this Tutorial, we demonstrate how to use 3d-OT to obtain the clustering results of Human_Lymph_Node A1.
Five simulated data consisting of two Omics that together contained the information of the ground truth.
The Omics were designed to simulate the transcriptome, and proteome respectively.
Loading package
[1]:
from lib_3d_OT.utils import *
import scanpy as sc
import numpy as np
import pandas as pd
import torch
from lib_3d_OT.multi_modialty import *
import torch.optim as optim
import warnings
warnings.filterwarnings("ignore")
During startup - Warning messages:
1: package ‘methods’ was built under R version 4.3.2
2: package ‘datasets’ was built under R version 4.3.2
3: package ‘utils’ was built under R version 4.3.2
4: package ‘grDevices’ was built under R version 4.3.2
5: package ‘graphics’ was built under R version 4.3.2
6: package ‘stats’ was built under R version 4.3.2
R[write to console]: __ __
____ ___ _____/ /_ _______/ /_
/ __ `__ \/ ___/ / / / / ___/ __/
/ / / / / / /__/ / /_/ (__ ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/ version 6.1.1
Type 'citation("mclust")' for citing this R package in publications.
[ ]:
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
Loading data
[2]:
truth=pd.read_csv('/home/dbj/mouse/SpatialGlue/Human_Lymph_Node/annotation.csv',index_col=0)
file_fold = '/home/dbj/mouse/SpatialGlue/Human_Lymph_Node/' #please replace 'file_fold' with the download path
adata_omics1 = sc.read_h5ad(file_fold + 'adata_RNA.h5ad')
adata_omics2 = sc.read_h5ad(file_fold + 'adata_ADT.h5ad')
adata_omics1.obs['truth']=truth['manual-anno']
adata_omics2.obs['truth']=truth['manual-anno']
adata_omics1.var_names_make_unique()
adata_omics2.var_names_make_unique()
3d-OT adopts standard pre-processing steps for the transcriptomic, and protein data.
Specifically, for the transcriptomics data, the gene expression counts are log-transformed and normalized by library size via the SCANPY package. The top 3,000 highly variable genes (HVGs) are selected as input of PCA for dimension reduction.
The protein expresssion counts are normliazed using CLR (Centered Log Ratio).
[3]:
sc.pp.filter_genes(adata_omics1, min_cells=10)
sc.pp.highly_variable_genes(adata_omics1, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata_omics1, target_sum=1e4)
sc.pp.log1p(adata_omics1)
sc.pp.scale(adata_omics1)
adata_omics1_high = adata_omics1[:, adata_omics1.var['highly_variable']]
adata_omics1.obsm['feat'] = pca(adata_omics1_high, n_comps=adata_omics2.n_vars-1)
# Protein
adata_omics2 = clr_normalize_each_cell(adata_omics2)
sc.pp.scale(adata_omics2)
adata_omics2.obsm['feat'] = pca(adata_omics2, n_comps=adata_omics2.n_vars-1)
Constructing the neighbor graph and training the PointNet++ Encoder
[6]:
set_seed(7)
graph1 = prepare_data(adata_omics1, location="spatial", nb_neighbors=6).to(device)
graph2 = prepare_data(adata_omics2, location="spatial", nb_neighbors=6).to(device)
input_dim = graph1.express.shape[-1]
hidden_dim=32
model = extractModel(input_dim,hidden_dim,n_heads=4, n_layers=3)
model = model.to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
start_time = time.time()
best_model = train_graph_extractor(graph1,graph2, model ,optimizer, device,epochs=500)
Epoch 500/500, Loss: 1.1671, Loss1: 0.9481, Loss2: 0.2190
Get multi-omics integrated representation mixed_modal_features
[7]:
with torch.no_grad():
MODEL=extractModel(input_dim=input_dim,hidden_dim=hidden_dim)
MODEL.to(device)
MODEL.load_state_dict(best_model)
MODEL.eval()
recon1, recon2,mixed_modal_features = MODEL(graph1, graph2)
mixed_modal_features = mixed_modal_features.squeeze(0)
gene_expression_matrix = mixed_modal_features.cpu().detach().numpy()
adata_omics1.obsm['3d-OT']=gene_expression_matrix
We use mclust for clustering
[10]:
clustering(adata_omics1, n_clusters=10,key='3d-OT', method='mclust',random=2024,n_comp=20)
Using 3d-OT representation for clustering...
fitting ...
|======================================================================| 100%
The clustering result
[12]:
import matplotlib.pyplot as plt
import copy
plt.rcParams['figure.figsize'] = (4,4)
plt.rcParams['font.size'] = 20
fig, ax = plt.subplots()
sc.pl.embedding(adata_omics1,basis='spatial',color='3d-OT',size=50,ax=ax)
Calculation of six types of supervision indicators
[13]:
import numpy as np
from sklearn.metrics import (homogeneity_score,mutual_info_score,v_measure_score,adjusted_mutual_info_score,normalized_mutual_info_score,adjusted_rand_score
)
true_labels = np.array(adata_omics1.obs['truth'])
cluster_labels = np.array(adata_omics1.obs['3d-OT'])
homogeneity = homogeneity_score(true_labels, cluster_labels)
mutual_info = mutual_info_score(true_labels, cluster_labels)
v_measure = v_measure_score(true_labels, cluster_labels)
ami = adjusted_mutual_info_score(true_labels, cluster_labels)
nmi = normalized_mutual_info_score(true_labels, cluster_labels)
ari = adjusted_rand_score(true_labels, cluster_labels)
print("Homogeneity:", homogeneity)
print("Mutual Information:", mutual_info)
print("V-Measure:", v_measure)
print("AMI:", ami)
print("NMI:", nmi)
print("ARI:", ari)
Homogeneity: 0.4099481619252866
Mutual Information: 0.6915841864265078
V-Measure: 0.39043921332436543
AMI: 0.38625401019068556
NMI: 0.39043921332436543
ARI: 0.256259941977966