原文地址:
https://www.nature.com/articles/s41467-020-17376-1
上期复刻了常见的Manhattan
图,这一期我们让它卷起来吧!
今天要复刻一下Nature Communications
上的一张Manhattan
圈图。
rm(list = ls())
library(CMplot)
data(pig60K)
data(cattle50K)
Note! 示例数据中的前三列分别是SNP
的名称、染色体和位置,
其余各列是GWAS
的p
值或traits
的GS/GP
。有时候你可能只是想画SNP
的密度图,这个时候前3列就够了。
上一期并未绘制简单的SNP密度图,这里我们补一下
CMplot(pig60K,
type="l", # "p"(point), "l"(line), "h"(vertical lines)
plot.type="d", # "d", "c", "m", "q" or "b"
bin.size=1e6, #the size of bin for SNP_density plot.
chr.den.col=c("darkgreen", "yellow", "red"), #the color for the SNP density
file="jpg", #"jpg", "pdf","tiff"
memo="",
dpi=300,
main="illumilla_60K", #title
file.output=F,
verbose=F,
width=9,height=6
)
这里需要说明一下plot.type
可选的有:
✅ "d"
→ SNP density plot
✅ "c"
→ circle-Manhattan plot
✅ "m"
→ Manhattan plot
✅ "q"
→ Q-Q plot
✅ "b"
→ both circle-Manhattan, Manhattan and Q-Q plots
CMplot(pig60K,
type="p",
plot.type="c",
chr.labels=paste("Chr",c(1:18,"X","Y"),sep=""),
r=0.4, # radius for the circle
cir.legend=T,
outward=F, # inside or outside
cir.legend.col="black", # the color of the axis of legend.
cir.chr.h=1.3, # the width for the boundary
chr.den.col="black", # the colour for the SNP density.
file="jpg",
memo="",
dpi=300,
file.output=F,
verbose=F,
width=10,height=10)
这里我们通过amplify = T
函数设置需要突出的点;
具体其他参数见代码注释~ 🤜🤛
CMplot(pig60K,
type="p",
plot.type="c",
r=0.4,
col=c("#E64B35E5","#4DBBD5E5","#00A087E5","#3C5488E5"), # dot color
chr.labels=paste("Chr",c(1:18,"X","Y"),sep=""),
threshold=c(1e-6,1e-4), # significant threshold.
cir.chr.h=1.5, # the width for the boundary
amplify=T, # Amplify points bigger than the minimal significant level
threshold.lty=c(1,2), # type for the line of threshold levels,
threshold.col=c("#91D1C2E5", "#DC0000E5"), # significant threshold color
signal.line=1, # thickness of lines of significant SNPs cross the circle.
signal.col=c("#91D1C2E5","#DC0000E5"), # colour of significant points
chr.den.col=c("#91D1C2E5","white","#DC0000E5"),
bin.size=1e6,
outward=FALSE,
file="jpg",
memo="",
dpi=300,
file.output=F,
verbose=F,
width=10,height=10)
这里有几个概念,补充给大家:
✅ 基因组选择:(Genomic selection) 基因组选择利用覆盖全基因组的高密度SNP标记, 结合表型记录或系谱记录对个体育种值进行估计, 其假定这些标记中至少有一个标记与所有控制性状的QTL处于连锁不平衡状态。
✅ 参考群:(Reference population) ✅ 候选群:(Candidate population) 基因组选择中, 参考群是指有基因型和表型信息的群体。 根据参考群的数据进行建模, 预测只有基因型个体的表型值。基因组选择的效率主要受参考群的大小, 规模以及和候选群的关系等因素的影响。 基因组选择将群体分为参考群体和候选群体, 参考群体用于建模, 估算候选群体的育种值。 参考群有表型和基因型, 候选群只有基因型。
CMplot(cattle50K,type="p",
plot.type="c",
LOG10=FALSE,
outward=TRUE,
col=matrix(c("#4DAF4A",NA,NA, "dodgerblue4",
"deepskyblue",NA,"dodgerblue1",
"olivedrab3", "darkgoldenrod1"),
nrow=3,
byrow=TRUE),
chr.den.col="black",
chr.labels=paste("Chr",c(1:29),sep=""),
threshold=NULL,
r=1.2,
cir.chr.h=1.5,
cir.legend.cex=0.5,
cir.band=1,
file="jpg",
memo="",
dpi=300,
file.output= F,
verbose= F,
width=10,height=10)
最后祝大家早日不卷!~