生信漫卷
不务正业的医学狗🐶,持续分享生信笔记📒和实用科研工具🧰、留学申请经验🎁
165篇原创内容
公众号
马上要去做急诊了,感觉医路漫漫,根本看不到头啊!~🫠
今天是GeneNMF包!~👇
GeneNMF 是一个专门针对基因表达数据设计的 非负矩阵分解(NMF) 工具包,旨在从高维组学数据中提取具有生物学意义的基因模块。🧬
rm(list = ls())
#install.packages("GeneNMF") #from CRAN
#remotes::install_github("carmonalab/GeneNMF") # or from GitHub
library(GeneNMF)
library(Seurat)
library(ggplot2)
library(UCell)
library(patchwork)
library(Matrix)
library(RcppML)
library(viridis)
options(timeout=1000)
ddir <- "input"
data.path <- sprintf("%s/pbmc_multimodal.downsampled20k.seurat.rds", ddir)
if (!file.exists(data.path)) {
dir.create(ddir)
dataUrl <- "https://www.dropbox.com/s/akzu3hp4uz2mpkv/pbmc_multimodal.downsampled20k.seurat.rds?dl=1"
download.file(dataUrl, data.path)
}
seu <- readRDS(data.path)
seu
NMF
可用于将数据的维度从数万个基因减少到几个维度(类似于 PCA)。😘
使用RunNMF() 函数
,可以直接应用于Seurat
对象,并将NMF
结果保存为新的降维。🆕
ndim <- 15
seu <- FindVariableFeatures(seu, nfeatures = 1000)
seu <- runNMF(seu, k = ndim, assay="SCT")
seu@reductions$NMF
UMAP
进一步降维!~
seu <- RunUMAP(seu, reduction = "NMF", dims=1:ndim, reduction.name = "NMF_UMAP", reduction.key = "nmfUMAP_")
DimPlot(seu, reduction = "NMF_UMAP", group.by = "celltype.l1", label=T) + theme(aspect.ratio = 1,
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()) + ggtitle("NMF UMAP") + NoLegend()
multiNMF() 函数
会自动对样本列表和多个值执行NMF k
。😏
seu.list <- SplitObject(seu, split.by = "donor")
geneNMF.programs <- multiNMF(seu.list, assay="SCT", slot="data", k=4:9, nfeatures = 1000)
我们现在可以将在多个样本和数量上鉴定的k
整合成metaprograms
(MPs
)。🧐
在这里,我们将定义10
个MP
。😏
geneNMF.metaprograms <- getMetaPrograms(geneNMF.programs,
nMP=10,
weight.explained = 0.7,
max.genes=100)
ph <- plotMetaPrograms(geneNMF.metaprograms)
查看参数!~😘
1️⃣ sample coverage
:检测到MP
的样本比例;
2️⃣ silhouette coefficient
:MP
中的单个程序相对于其他MP
中的程序的相似程度(越高越好!~)
geneNMF.metaprograms$metaprograms.metrics
驱动每个 MP 的基因是什么?😧
一起看下吧!~😘
lapply(geneNMF.metaprograms$metaprograms.genes, head)
library(msigdbr)
library(fgsea)
这里使用的是C8
,cell type signature gene sets
。😁
top_p <- lapply(geneNMF.metaprograms$metaprograms.genes, function(program) {
runGSEA(program, universe=rownames(seu), category = "C8")
})
head(top_p$MP4)
评估从数据中学到的基因程序的一种简单方法是使用 UCell 包计算基因特征分数。😘
mp.genes <- geneNMF.metaprograms$metaprograms.genes
seu <- AddModuleScore_UCell(seu, features = mp.genes, assay="SCT", ncores=4, name = "")
VlnPlot(seu, features=names(mp.genes), group.by = "celltype.l1",
pt.size = 0, ncol=5)
matrix <- seu@meta.data[,names(mp.genes)]
#dimred <- scale(matrix)
dimred <- as.matrix(matrix)
colnames(dimred) <- paste0("MP_",seq(1, ncol(dimred)))
#New dim reduction
seu@reductions[["MPsignatures"]] <- new("DimReduc",
cell.embeddings = dimred,
assay.used = "RNA",
key = "MP_",
global = FALSE)
我们还可以使用这些分数来生成UMAP
并在2D
中可视化数据:
set.seed(123)
seu <- RunUMAP(seu, reduction="MPsignatures", dims=1:length(seu@reductions[["MPsignatures"]]),
metric = "euclidean", reduction.name = "umap_MP")
FeaturePlot(seu, features = names(mp.genes), reduction = "umap_MP", ncol=5) &
scale_color_viridis(option="B") &
theme(aspect.ratio = 1, axis.text=element_blank(), axis.ticks=element_blank())
a <- DimPlot(seu, reduction = "umap_MP", group.by = "celltype.l1", label=T) + theme(aspect.ratio = 1,
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()) + ggtitle("Original cell types") + NoLegend()
b <- DimPlot(seu, reduction = "umap_MP", group.by = "donor", label=T) + theme(aspect.ratio = 1,
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()) + ggtitle("Donor") + NoLegend()
a | b