单细胞scITD张量分解分析工具

原创

凑齐六个字吧

发布于 2025-10-12 22:13:28

760

文章被收录于专栏：单细胞单细胞

组织和个体水平的生物学过程通常涉及多种不同细胞类型的协同作用。然而，当前用于分析单细胞RNA测序数据的计算方法，并未设计用于捕获不同样本之间细胞状态的协同变化，这在一定程度上是由于大多数单细胞RNA测序数据集中生物学样本数量有限。

近年来，样本复用技术的进步使得在群体尺度上进行单细胞转录组测量成为可能。为充分利用此类数据集，研究团队提出一种新的计算方法——单细胞可解释张量分解（single-cell Interpretable Tensor Decomposition, scITD）。

该方法能够提取在不同生物样本间变化的“多细胞基因表达模式”。这些模式揭示了某一细胞类型的变化如何与其他细胞类型的变化相互关联。此外，这些多细胞模式还可与已知的协变量（例如疾病状态、治疗方式或技术批次效应等）建立关联，并用于对异质性样本进行分层分析。

研究团队的例子：利用体外实验数据和模拟数据验证了scITD的性能。随后，将其应用于来自115例系统性红斑狼疮患者和 56 名健康对照者的外周血单个核细胞scRNA-seq数据分析中。结果得到了一个跨细胞类型的干扰素信号特征，该特征与抗dsDNA自身抗体的存在及疾病活动指数显著相关。然后进一步鉴定出一种新的多细胞模式，该模式可能促进存在抗dsDNA抗体患者的肾脏受累。这一模式的特征是活化记忆 B 细胞的扩增以及辅助性T细胞的活化，并推测由单核细胞与辅助性T细胞之间 ICOSLG–ICOS 相互作用增强所介导。最后，将 scITD 应用于两组COVID-19患者的PBMC数据集，识别出能够根据疾病严重程度分层患者的可重复多细胞模式。总体而言，scITD 是一种灵活的分析方法，能够探索多样本单细胞数据集中细胞状态的协同变化，从而揭示定义并分层疾病的复杂、非细胞自主性依赖关系。

分析流程

1.导入

library(scITD)
library(qs)
library(Seurat)
sce <- qread("./sc_dataset.qs")
#DefaultAssay(sce) <- "RNA"
# counts矩阵
counts <- GetAssayData(sce, layer = "counts")

# metadata信息
meta <- sce@meta.data

# ensembl to gene name conversions
#feature.names <- readRDS('/home/jmitchel/data/van_der_wijst/genes.rds')

2.数据预处理

# meta数据中需要包含donors和ctype列
meta$donors <-meta$orig.ident 
meta$ctypes <- meta$celltype

table(sce$celltype)
# 设定参数
param_list <- initialize_params(ctypes_use = c("epithelial/cancer cells", 
                                               "T/NK cells", 
                                               "B/plasma cells", 
                                               "dendritic cells", 
                                               "endothelial cells",
                                               "mast cells","myeloid cells",
                                               "fibroblasts"),
                                ncores = 30, rand_seed = 10)

# 创建项目容器
container <- make_new_container(count_data=counts, 
                                     meta_data=meta,
                                     gn_convert = NULL,#feature.names, 
                                     params=param_list,
                                     label_donor_sex = FALSE)

3.识别后续纳入分析的基因

sce_container <- form_tensor(container, 
                             donor_min_cells=5,
                             norm_method='trim', 
                             scale_factor=10000,
                             vargenes_method='norm_var_pvals', 
                             vargenes_thresh=.5,# 如果识别的基因较少可增大该值
                             scale_var = TRUE, 
                             var_scale_power = 2) #0.5-2

# 应检查识别出的过度分散基因数量，以确保在运行分解之前张量中有足够的基因
print(length(sce_container[["all_vargenes"]]))
# 764

4.运行张量分解

#指定rotation_type='hybrid' 来表示我们希望使用混合旋转方法优化因子载荷。这通过使基因模式更模块化/独立于彼此来提高因子的可解释性。
#通过设置ranks=c(5,10)，表示希望使用10个基因集提取5个在不同供体间变化的细胞过程
sce_container <- run_tucker_ica(sce_container, 
                                 ranks=c(5,10),
                                 tucker_type = 'regular',
                                 rotation_type = 'hybrid')

# 获取供体评分与元数据的关联关系
sce_container <- get_meta_associations(sce_container,
                                       vars_test=c('Gender','hpv'),
                                       stat_use='pval')

# 绘制donor分数
sce_container <- plot_donor_matrix(sce_container,
                                   meta_vars=c('Gender','hpv'),
                                   show_donor_ids = TRUE,
                                   add_meta_associations='pval')

# 展示热图
sce_container$plots$donor_matrix

5.不同Factor对应的主要基因热图

# 获得显著性基因
sce_container <- get_lm_pvals(sce_container)
# generate the loadings plots
sce_container <- get_all_lds_factor_plots(sce_container, 
                                           use_sig_only=TRUE,
                                           nonsig_to_zero=TRUE,
                                           sig_thresh=.05,#可以调整该值
                                           display_genes=FALSE,
                                           gene_callouts = TRUE,
                                           callout_n_gene_per_ctype=3,
                                           show_var_explained = TRUE)

# 将多个绘图排列成一个图形并显示该图形
myfig <- render_multi_plots(sce_container,data_type='loadings')
myfig

6.不同Factor的富集分析

sce_container <- run_gsea_one_factor(sce_container, 
                                      factor_select=1,#选择不同的Factor 
                                      method="fgsea", 
                                      thresh=0.05, 
                                      db_use=c("GO"), 
                                      signed=TRUE)

7.不同Factor中具体的信息

# 设定了第一个factor
f1_data <- get_one_factor(sce_container, factor_select=1)
f1_dscores <- f1_data[[1]]
f1_loadings <- f1_data[[2]]

print(head(f1_dscores))
#              [,1]
# HN76   -0.2501274
# HN60   -0.2958658
# HN77   -0.2872105
# HN46   -0.1661317
# C51     0.7326189
# HN31TS -0.1170771
print(head(f1_loadings))
#            B/plasma cells dendritic cells endothelial cells epithelial/cancer cells fibroblasts mast cells myeloid cells T/NK cells
# AAED1           10.628074       3.0629523         2.7432420              3.25292820   3.5249181  3.1020269    12.7539745 20.9695934
# ABL2             3.914351       5.1056190         9.2485065              3.15065160   7.8121780  3.9127359    25.8230020  1.5070746
# AC004556.1       1.251231       2.5441321         1.2062725              0.07827379   3.0078168  2.1678936     6.7786635  8.6538278
# AC007032.1       0.238599      -0.5046208        -0.5923934             -0.98036824  -0.4249645 -0.1446775    -6.4020708 -1.2324611
# AC007384.1      -2.192128      -1.7933623         0.4326625             -1.35659920  -0.1672751 -0.7426810    -7.1289808 -8.7917116
# AC012236.1      -6.699720      -0.5442649        -0.4120777             -0.56894254   0.1481724 -0.3718322    -0.1880744  0.5493611

# get assistance with rank determination
# 辅助确定rank值
# sce_container <- determine_ranks_tucker(sce_container, 
#                                         max_ranks_test=c(1,5),#根据自己的数据来
#                                         shuffle_level='cells', 
#                                         num_iter=10, 
#                                         norm_method='trim',
#                                         scale_factor=10000,
#                                         scale_var=TRUE,
#                                         var_scale_power=2)
# 
# sce_container$plots$rank_determination_plot

参考资料：

Coordinated, multicellular patterns of transcriptional variation that stratify patient cohorts are revealed by tensor decomposition. Nat Biotechnol. 2025 Jul;43(7):1192-1201.

注：若对内容有疑惑或者有发现明确错误的朋友，请联系后台。更多相关内容可关注公众号：生信方舟 。

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

数据分析

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

数据分析

#单细胞

登录后参与评论

0 条评论

热度