文章/答案/技术大牛

发布

GWAS数据通路富集方法-MAGMA软件

文章来源：企鹅号 - Freescience联盟

腾讯特效SDK 2.5折起

：美颜基础/原子能力套餐低至1500元/月，提供丰富的美化能力，支持全平台集成

本推文相关的数据和代码储存于百度网盘：链接：https://eyun.baidu.com/s/3c1GJdNa 密码：hWAG（或发送后台“练习资料”，即可得链接和密码）

目前，有很多可以用于GWAS数据分析的软件和方法。比如说GSA-SNP，FORGE，MRPEA，INRICH，DGAT，ALIGATOR, MAGENTA, Set screen test method等等。当然，它们各有自己的优势和缺陷。大家可以根据自己的需要自行选择合适的。

这里我们和大家分享一个最近Nature, Nature Genetics, Nature Neuroscience等大文章中常用的分析软件MAGMA。

这个软件的英文介绍是MAGMA is a tool for gene analysis and generalized gene-set analysis of GWAS data. It can be used to analyses both raw genotype data as well as summary SNP p-values from a previous GWAS or meta-analysis.

即此软件既可以分析基因水平又可以分析生物通路水平，既可以分析GWAS的原始数据又可以分析GWAS summary数据。是一个功能十分强大，而操作又很方便的软件。我们可以从官网上直接免费下载：https://ctg.cncr.nl/software/magma。

此软件可以基于Linux系统，也可以基于Windows系统。

这个MAGMA软件相关的文章发表在PLoS Computational Biology杂志上:

de Leeuw C, Mooij J, Heskes T, Posthuma D: MAGMA: Generalized gene-set analysis of GWAS data. PLoS Comput Biol 11(4): e1004219. doi:10.1371/journal.pcbi.1004219.

首先，我们需要GWAS数据，如果您有自己感兴趣的GWAS原始数据那是最好，没有的话我们可以从公共数据库内下载已有GWAS summary数据进行分析，发现新的结论。现在我们从https://www.med.unc.edu/pgc/downloads数据库下载吸烟的GWAS数据：tag.evrsmk.tbl.

对文件重新命名： ever_smoking.results，其内部格式如下：

由于这个TAG GWAS研究于2010年发表在NG上的，所以参考基因组是hg18，比较旧。这里我们利用liftover软件将其升到hg19，再用于后面的分析。代码如下：

1）利用picard工具去改变vcf文件格式：如从hg18版本变到hg19版本：

java -jar picard.jar LiftoverVcf \

I=input.vcf \

O=lifted_over.vcf \

CHAIN=hg18tohg19.chain \

REJECT=rejected_variants.vcf \

R=reference_sequence.fasta

2）利用liftOver软件进行hg18 to hg19转换：

java -jar picard.jar LiftoverVcf \

I=input.vcf \

O=lifted_over.vcf \

CHAIN=hg18tohg19.chain \

REJECT=rejected_variants.vcf \

R=reference_sequence.fasta

如下：

./liftOver -bedPlus=4 ever_smoking.results hg18ToHg19.over.chain ever_smoking.results.hg19.bed ever_smoking.results_unmapped.txt

（↑可按住屏幕左右滑动）

接下来我们利用MAGMA软件先将SNP注释到gene上。

###Annotation performed with the following command:

代码pattern:magma --annotate --snp-loc[SNPLOC_FILE] --annotate window=5,1.5 --gene-loc [GENELOC_FILE] --out[ANNOT_PREFIX]

这里SNP的location文件格式是：

#The SNP location file should contain three columns:

前三列是:SNP ID, chromosome, and base pair position (并且没有header)

做出SNP location文件：

gawk '' ever_smoking.results.hg19.bed > ever_smoking.results.hg19.location &

sed -i "s/chr//g" ever_smoking.results.hg19.location 去除第一列染色体上的chr

（↑可按住屏幕左右滑动）

做出SNP对应P值文件：

（↑可按住屏幕左右滑动）

1# ever Smoking_TAG数据进行SNP annotation:

nohup ./magma --snp-loc ./GWAS_Summary_SCZ_Smoking/ever_smoking.results.hg19.location --annotate window=35,10 --gene-loc NCBI37.3.gene.loc --out ever_smoking_SNP_Gene_annotation &

（↑可按住屏幕左右滑动）

2# ever Smoking_TAG数据进行Gene-based analysis:

nohup ./magma --bfile g1000_eur --pval ./GWAS_Summary_SCZ_Smoking/ever_smoking.results_Pval N=69409 --gene-annot ever_smoking_SNP_Gene_annotation.genes.annot --out ever_smoking_SNP_Gene_Analysis_P &

（↑可按住屏幕左右滑动）

3# ever Smoking_TAG数据进行Gene-set analysis (or pathway-based analysis)

nohup ./magma --gene-results ever_smoking_SNP_Gene_Analysis_P.genes.raw --model fwer=10000 --set-annot ./Pathways/GO_PANTHER_INGENUITY_KEGG_REACTOME_BIOCARTA_new --out ever_smoking_pathway_P &

（↑可按住屏幕左右滑动）

总结：

通过以上的代码和数据，我们就可以分析GWAS的gene-based or gene-set水平的数据，发现一些新的结果。像这样基于GWAS summary数据的公共数据挖掘有很多文章。主要是找到自己想要解释的科学问题，然后找到数据进行分析。

这里我推荐一篇不错的文章可供大家阅读，其是2015年发表在Nature Neuroscience上（PMID: 25599223）：Psychiatricgenome-wide association studyanalysesimplicateneuronal,immuneandhistonepathways. Network and Pathway Analysis Subgroup ofPsychiatricGenomics Consortium. Nat Neurosci. 2015 Feb;18(2):199-209.

点这里领我们整理的软件库

点这里查看sci文章润色服务

点这里看R界传奇老司机直播录像

发表于: 2018-06-062018-06-06 00:00:24
原文链接：https://kuaibao.qq.com/s/20180606B08V9100?refer=cp_1026
腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
如有侵权，请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长进交流群

领取专属 10元无门槛券

私享最新 技术干货

GWAS数据通路富集方法-MAGMA软件

腾讯特效SDK 2.5折起

相关快讯

扫码

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐