这两天我们讨论了差异分析:TCGA等大样本量差异分析该使用DEseq2还是edgeR呢? 以及:使用DEseq2做转录组测序差异分析的时候顺便去除批次效应,就免不了提一下可视化了:
下面复制粘贴就可以运行的代码
前些天我们的《生信菜鸟团》公众号的一个笔记:一起画个圈圈看差异基因,吸引了大家的注意,有评论说其实没有必要把不同染色体差异基因使用圈圈图展示,简简单单火山图更好。那我们就比较一下吧:
加载airway数据集并转换为表达矩阵,代码如下所示:
# 1.构建表达矩阵 ----------------------------------------------------------------
dir.create("./data")
# 魔幻操作,一键清空
rm(list = ls())
options(stringsAsFactors = F)
# BiocManager::install('airway')
# 加载airway数据集并转换为表达矩阵
library(airway,quietly = T)
data(airway)
class(airway)
rawcount <- assay(airway)
colnames(rawcount)
# 查看表达谱
rawcount[1:4,1:4]
# 去除前的基因表达矩阵情况
dim(rawcount)
# 获取分组信息
group_list <- colData(airway)$dex
group_list
# 过滤在至少在75%的样本中都有表达的基因
keep <- rowSums(rawcount>0) >= floor(0.75*ncol(rawcount))
table(keep)
filter_count <- rawcount[keep,]
filter_count[1:4,1:4]
dim(filter_count)
接下来就可以对这个转录组测序表达量矩阵变量 filter_count 和其分组信息变量 group_list 走DESeq2差异分析流程啦。
# 加载包
library(DESeq2)
# 第一步,构建DESeq2的DESeq对象
colData <- data.frame(row.names=colnames(exprSet),group_list=group_list)
dds <- DESeqDataSetFromMatrix(countData = exprSet,colData = colData,design = ~ group_list)
# 第二步,进行差异表达分析
dds2 <- DESeq(dds)
# 提取差异分析结果,trt组对untrt组的差异分析结果
tmp <- results(dds2,contrast=c("group_list","trt","untrt"))
DEG_DESeq2 <- as.data.frame(tmp[order(tmp$padj),])
head(DEG_DESeq2)
# 去除差异分析结果中包含NA值的行
DEG_DESeq2 = na.omit(DEG_DESeq2)
差异分析很简单的, 但是需要注释一下上下调基因的属性,以及基因的染色体坐标:
# 筛选上下调,设定阈值
fc_cutoff <- 1
fdr <- 0.05
DEG_DESeq2$regulated <- "normal"
loc_up <- intersect(which(DEG_DESeq2$log2FoldChange>log2(fc_cutoff)),
which(DEG_DESeq2$padj<fdr))
loc_down <- intersect(which(DEG_DESeq2$log2FoldChange< (-log2(fc_cutoff))),
which(DEG_DESeq2$padj<fdr))
DEG_DESeq2$regulated[loc_up] <- "up"
DEG_DESeq2$regulated[loc_down] <- "down"
table(DEG_DESeq2$regulated)
head(DEG_DESeq2)
library(AnnoProbe)
ag=annoGene(rownames(DEG_DESeq2),
ID_type = 'ENSEMBL',species = 'human'
)
head(ag)
DEG_DESeq2$ENSEMBL=rownames(DEG_DESeq2)
deg_anno=merge(ag,DEG_DESeq2,by='ENSEMBL')
head(deg_anno)
# 保存
write.table(deg_anno,"./data/DEG_DESeq2_all.xls",
row.names = F,sep="\t",quote = F)
save(deg_anno, file = "./data/Step03-DESeq2_nrDEG.Rdata")
直接参考《生信菜鸟团》公众号的笔记:一起画个圈圈看差异基因,代码如下所示:
rm(list = ls())
options(stringsAsFactors = F)
library(circlize)
load(file = "./data/Step03-DESeq2_nrDEG.Rdata")
load(file = "./data/Step01-airwayData.Rdata")
head(deg_anno)
table(deg_anno$chr)
neworder=c(paste0('chr',1:22),'chrX','chrY','chrM')
deg_anno$chr=factor(deg_anno$chr,neworder,ordered = T)
table(deg_anno$chr)
table(deg_anno$regulated)
bed1=deg_anno[deg_anno$regulated=='up',c('chr','start','end','log2FoldChange')]
colnames(bed1) = c("chr","start","end","value1")
bed2=deg_anno[deg_anno$regulated=='down',c('chr','start','end','log2FoldChange')]
colnames(bed2) = c("chr","start","end","value1")
####3.初始化基因组圈图####
circos.initializeWithIdeogram(species = "hg38",plotType = NULL)
circos.track(ylim = c(0, 1), panel.fun = function(x, y) {
chr = CELL_META$sector.index
xlim = CELL_META$xlim
ylim = CELL_META$ylim
circos.rect(xlim[1], 0, xlim[2], 1, col = rand_color(1))
circos.text(mean(xlim), mean(ylim), chr, cex = 0.7, col = "white",
facing = "inside", niceFacing = TRUE)
}, track.height = 0.15, bg.border = NA)
####4.添加基因基因的圈圈####
#黑色下调,红上调
bed_list = list(bed2,bed1)
circos.genomicTrack(bed_list,
panel.fun = function(region, value, ...) {
i = getI(...)
circos.genomicPoints(region, value, pch = 16, cex = 0.5, col = i, ...)
})
#清除
circos.clear()
如下所示:
圈圈图
代码超级简单:
library(EnhancedVolcano)
colnames(deg_anno)
EnhancedVolcano(deg_anno,
lab = deg_anno$SYMBOL,
x = 'log2FoldChange',
y = 'padj')
出图如下所示:
火山图
你喜欢哪种可视化方式呢?火山图还是圈圈图?