Loading [MathJax]/jax/output/CommonHTML/config.js
前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >PRSice使用文档(英文版)

PRSice使用文档(英文版)

作者头像
生信与临床
发布于 2020-08-27 08:57:43
发布于 2020-08-27 08:57:43
1.5K0
举报

usage: Rscript PRSice.R [options] <-b base_file> <-t target_file> <--prsice prsice_location> Required: --prsice Location of the PRSice binary --dir Location to install ggplot. Only require if ggplot is not installed Base File: --base-info Base INFO score filtering. Format should be <Column name>:<Threshold>. SNPs with info score less than <Threshold> will be ignored Column name default: INFO Threshold default: 0.9 --base-maf Base MAF filtering. Format should be <Column name>:<Threshold>. SNPs with maf less than <Threshold> will be ignored. An additional column can also be added (e.g. also filter MAF for cases), using the following format: <Column name>:<Threshold>,<Column name>:<Threshold> --beta Whether the test statistic is in the form of BETA or OR. If set, test statistic is assume to be in the form of BETA. Mutually exclusive from --or --bp Column header containing the SNP coordinate Default: BP --chr Column header containing the chromosome Default: CHR --index If set, assume the INDEX instead of NAME for the corresponding columns are provided. Index should be 0-based (start counting from 0) --no-default Remove all default options. If set, PRSice will not set any default column name and you must manually provide all required columns (--snp, --stat, --A1, --pvalue) --or Whether the test statistic is in the form of BETA or OR. If set, test statistic is assume to be in the form of OR. Mutually exclusive from --beta --pvalue | -p Column header containing the p-value Default: P --snp Column header containing the SNP ID Default: SNP --stat Column header containing the summary statistic If --beta is set, default as BETA. Otherwise, will search for OR or BETA from the header of the base file Target File: --binary-target Indicate whether the target phenotype is binary or not. Either T or F should be provided where T represent a binary phenotype. For multiple phenotypes, the input should be separated by comma without space. Default: T if --beta and F if --beta is not --info Filter SNPs based on info score. Only used for imputed target --keep File containing the sample(s) to be extracted from the target file. First column should be FID and the second column should be IID. If --ignore-fid is set, first column should be IID Mutually exclusive from --remove --maf Filter SNPs based on minor allele frequency (MAF) --nonfounders Keep the nonfounders in the analysis Note: They will still be excluded from LD calculation --pheno | -f Phenotype file containing the phenotype(s). First column must be FID of the samples and the second column must be IID of the samples. When --ignore-fid is set, first column must be the IID of the samples. Must contain a header if --pheno-col is specified --pheno-col | -F Headers of phenotypes to be included from the phenotype file --prevalence | -k Prevalence of all binary trait. If provided will adjust the ascertainment bias of the R2. Note that when multiple binary trait is found, prevalence information must be provided for all of them --remove File containing the sample(s) to be removed from the target file. First column should be FID and the second column should be IID. If --ignore-fid is set, first column should be IID Mutually exclusive from --keep --target | -t Target genotype file. Currently support both BGEN and binary PLINK format. For multiple chromosome input, simply substitute the chromosome number with #. PRSice will automatically replace # with 1-22 For binary plink format, you can also specify a seperate fam file by <prefix>,<fam file> --target-list File containing prefix of target genotype files. Similar to --target but allow more flexibility. Do not support external fam file at the moment --type File type of the target file. Support bed (binary plink) and bgen format. Default: bed Dosage: --allow-inter Allow the generate of intermediate file. This will speed up PRSice when using dosage data as clumping reference and for hard coding PRS calculation --dose-thres Translate any SNPs with highest genotype probability less than this threshold to missing call --hard-thres A hardcall is saved when the distance to the nearest hardcall is less than the hardcall threshold. Otherwise a missing code is saved Default is: 0.1 --hard Use hard coding instead of dosage for PRS construction. Default is to use dosage instead of hard coding Clumping: --clump-kb The distance for clumping in kb Default: 250kb (1mb for PRSet) --clump-r2 The R2 threshold for clumping Default: 0.1 --clump-p The p-value threshold use for clumping. Default: 1 --ld | -L LD reference file. Use for LD calculation. If not provided, will use the post-filtered target genotype for LD calculation. Support multiple chromosome input Please see --target for more information --ld-dose-thres Translate any SNPs with highest genotype probability less than this threshold to missing call --ld-geno Filter SNPs based on genotype missingness --ld-hard-thres A hardcall is saved when the distance to the nearest hardcall is less than the hardcall threshold. Otherwise a missing code is saved Default is: 0.1 --ld-info Filter SNPs based on info score. Only used for imputed LD reference --ld-keep File containing the sample(s) to be extracted from the LD reference file. First column should be FID and the second column should be IID. If --ignore-fid is set, first column should be IID Mutually exclusive from --ld-remove No effect if --ld was not provided --ld-list File containing prefix of LD reference files. Similar to --ld but allow more flexibility. Do not support external fam file at the moment --ld-maf Filter SNPs based on minor allele frequency --ld-remove File containing the sample(s) to be removed from the LD reference file. First column should be FID and the second column should be IID. If --ignore-fid is set, first column should be IID Mutually exclusive from --ld-keep --ld-type File type of the LD file. Support bed (binary plink) and bgen format. Default: bed --no-clump Stop PRSice from performing clumping --proxy Proxy threshold for index SNP to be considered as part of the region represented by the clumped SNP(s). e.g. --proxy 0.8 means the index SNP will represent region of any clumped SNP(s) that has a R2>=0.8 even if the index SNP does not physically locate within the region Covariate: --cov | -C Covariate file. First column should be FID and the second column should be IID. If --ignore-fid is set, first column should be IID --cov-col | -c Header of covariates. If not provided, will use all variables in the covariate file. By adding @ in front of the string, any numbers within [ and ] will be parsed. E.g. @PC[1-3] will be read as PC1,PC2,PC3. Discontinuous input are also supported: @cov[1.3-5] will be parsed as cov1,cov3,cov4,cov5 --cov-factor Header of categorical covariate(s). Dummy variable will be automatically generated. Any items in --cov-factor must also be found in --cov-col Also accept continuous input (start with @). P-value Thresholding: --bar-levels Level of barchart to be plotted. When --fastscore is set, PRSice will only calculate the PRS for threshold within the bar level. Levels should be comma separated without space --fastscore Only calculate threshold stated in --bar-levels --no-full By default, PRSice will include the full model, i.e. p-value threshold = 1. Setting this flag will disable that behaviour --interval | -i The step size of the threshold. Default: 0.00005 --lower | -l The starting p-value threshold. Default: 5e-8 --model Genetic model use for regression. The genetic encoding is based on the base data where the encoding represent number of the coding allele Available models include: add - Additive model, code as 0/1/2 (default) dom - Dominant model, code as 0/1/1 rec - Recessive model, code as 0/0/1 het - Heterozygous only model, code as 0/1/0 --missing Method to handle missing genotypes. By default, final scores are averages of valid per-allele scores with missing genotypes contribute an amount proportional to imputed allele frequency. To throw out missing observations instead (decreasing the denominator in the final average when this happens), use the 'SET_ZERO' modifier. Alternatively, you can use the 'CENTER' modifier to shift all scores to mean zero. --no-regress Do not perform the regression analysis and simply output all PRS. --score Method to calculate the polygenic score. Available methods include: avg - Take the average effect size (default) std - Standardize the effect size con-std - Standardize the effect size using mean and sd derived from control samples sum - Direct summation of the effect size --upper | -u The final p-value threshold. Default: 0.5 PRSet: --background String to indicate a background file. This string should have the format of Name:Type where type can be bed - 0-based range with 3 column. Chr Start End range - 1-based range with 3 column. Chr Start End gene - A file contain a column of gene name --bed | -B Bed file containing the selected regions. Name of bed file will be used as the region identifier. WARNING: Bed file is 0-based --feature Feature(s) to be included from the gtf file. Default: exon,CDS,gene,protein_coding. --full-back Use the whole genome as background for competitive p-value calculation --gtf | -g GTF file containing gene boundaries. Required when --msigdb is used --msigdb | -m MSIGDB file containing the pathway information. Require the gtf file --snp-set Provide a SNP set file containing the snp set(s). Two different file format is allowed: SNP list format - A file containing a single column of SNP ID. Name of the set will be the file name or can be provided using --snp-set File:Name MSigDB format - Each row represent a single SNP set with the first column containing the name of the SNP set. --wind-3 Add N base(s) to the 3' region of each feature(s) --wind-5 Add N base(s) to the 5' region of each feature(s) Plotting: --bar-col-high Colour of the most predicting threshold Default: firebrick --bar-col-lower Colour of the poorest predicting threshold Default: dodgerblue --bar-col-p Change the colour of bar to p-value threshold instead of the association with phenotype --bar-palatte Colour palatte to be used for bar plotting when --bar_col_p is set. Default: YlOrRd --device Select different plotting devices. You can choose any plotting devices supported by base R. Default: png --multi-plot Plot the top N phenotype / gene set in a summary plot --plot When set, will only perform plotting. --plot-set Define the gene set to be plot. Default: Base --quantile | -q Number of quantiles to plot. No quantile plot will be generated when this is not provided. --quant-break Quantile groupings for plotting the strata plot --quant-extract | -e File containing sample ID to be plot on a separated quantile e.g. extra quantile containing only schizophrenia samples. Must contain IID. Should contain FID if --ignore-fid isn't set. --quant-ref Reference quantile for quantile plot --scatter-r2 y-axis of the high resolution scatter plot should be R2 Misc: --all-score Output PRS for ALL threshold. WARNING: This will generate a huge file --chr-id Try to construct an RS ID for SNP based on its chromosome, coordinate, effective allele and non-effective allele. e.g. c:L-aBd is translated to: <chr>:<coordinate>-<effective><noneffective>d This is always true for target file, whereas for base file, this is only used if the RS ID wasn't provided --exclude File contains SNPs to be excluded from the analysis --extract File contains SNPs to be included in the analysis --id-delim This parameter causes sample IDs to be parsed as <FID><delimiter><IID>; the default delimiter is '_'. --ignore-fid Ignore FID for all input. When this is set, first column of all file will be assume to be IID instead of FID --keep-ambig Keep ambiguous SNPs. Only use this option if you are certain that the base and target has the same A1 and A2 alleles --logit-perm When performing permutation, still use logistic regression instead of linear regression. This will substantially slow down PRSice --memory Maximum memory usage allowed (in Mb). PRSice will try its best to honor this setting --non-cumulate Calculate non-cumulative PRS. PRS will be reset to 0 for each new P-value threshold instead of adding up --out | -o Prefix for all file output --perm Number of permutation to perform. This swill generate the empirical p-value. Recommend to use value larger than 10,000 --print-snp Print all SNPs that remains in the analysis after clumping is performed. For PRSet, Y indicate the SNPs falls within the gene set of interest and N otherwise. If only PRSice is performed, a single "gene set" called "Base" will be presented with all entries marked as Y --seed | -s Seed used for permutation. If not provided, system time will be used as seed. When same seed and same input is provided, same result can be generated --thread | -n Number of thread use --use-ref-maf When specified, missingness imputation will be performed based on the reference samples --ultra Ultra aggressive memory usage. When this is enabled PRSice and PRSet will try to load all genotypes into memory after clumping is performed. This should drastically speed up PRSice and PRSet at the expense of higher memory consumption. Has no effect for dosage score --x-range Range of SNPs to be excluded from the whole analysis. It can either be a single bed file or a comma seperated list of range. Range must be in the format of chr:start-end or chr:coordinate --help | -h Display this help message

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2020-08-25,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信与临床 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
暂无评论
推荐阅读
编辑精选文章
换一批
不会安装使用PRSice-2软件就太不讲究了
之前说要学习PRS,研究了一下,发现它和MAS、GWAS、GS都有相通之处,尤其是MAS,对于分子标记辅助选择,选择最适合的位点进行表型数据的预测,真是一个非常好的工具,它可以自动考虑LD冗余,矫正effect,选择最优子集,并给出目标群体(候选群)的预测值。
邓飞
2022/12/13
9311
不会安装使用PRSice-2软件就太不讲究了
多基因风险评分(PRS)分析教程
多基因风险评分(Polygenic Risk Score)分析过程概览。PRS 分析需要两个输入数据集:i)base data(GWAS):全基因组范围内遗传变异的基因型-表型关联的摘要统计信息(例如 beta,P值) ;ii)target data:目标样本中个体的基因型和表型。基于 base data 得到的 SNP 效应值计算 target data 中样本的 PRS。
生信菜鸟团
2020/08/04
16.5K0
多基因风险评分(PRS)分析教程
如何使用plink进行二分类性状的GWAS分析并计算PRS得分
这篇博客,用之前GWAS教程中的示例数据(快来领取 | 飞哥的GWAS分析教程),把数据分为Base数据和Target数据,通过plink运行二分类的logistic模型进行GWAS分析,然后通过PRSice-2软件,进行PRS分析。最终,选出最优SNP组合,并计算Target的PRS得分,主要结果如下:
邓飞
2022/12/13
2.9K0
如何使用plink进行二分类性状的GWAS分析并计算PRS得分
如何计算连续性状的PRS得分
大家好,我是邓飞,对于动植物育种而言,我之前写过PRS和MAS以及GS的关系,有老师评论说PRS更类似GS,因为它可以利用已有的GWAS信息,直接预测候选群的表型,如果按照动植物的GS方法,几十万几百万的样本做GS显然不现实,而PRS提供了这种思路,就可以利用已有的GWAS结果,通过一些质控,来预测候选群的表现(目标群体的风险得分)。
邓飞
2022/12/13
1.1K0
如何计算连续性状的PRS得分
TwoSampleMR:孟德尔随机化一站式分析
包如其名,TwoSampleMR主要是为两样本孟德尔随机化分析而准备的,在应用这个包以前,我们来看看它的核心函数及其功能:
生信菜鸟团
2023/08/23
6.8K0
TwoSampleMR:孟德尔随机化一站式分析
plink软件cookbook
快,真的是快,我用perl或者Python编写的代码运行需要50s,plink不到1s完成,在C语言面前,我掌握的语言是苍白的。所以,好好利用plink软件,对于速度的提升非常显著。
邓飞
2021/03/30
2.3K0
plink软件cookbook
PRS多基因评分教程学习笔记(二)
之前学习了Base Data质控过程,下面继续,最近一直没有开启博客写作,十月将过,加紧补点。
用户1075469
2020/03/03
2.4K0
多基因风险评分4
今天这一期是多基因风险评分的最后内容,我将和大家主要介绍一下如何解读并充分利用PRSice的结果。
生信与临床
2022/08/21
9190
多基因风险评分4
开局一张图 | 介绍PRS的计算步骤
大家好,我是邓飞,昨天介绍了PRS软件的安装(不会安装使用PRSice-2软件就太不讲究了),今天介绍一下PRS的操作步骤。
邓飞
2022/12/13
1.1K0
开局一张图 | 介绍PRS的计算步骤
一文搞定基因型数据清洗
我已经下载整理好了,下载本书的电子版pdf+数据+代码,链接:书籍及配套代码领取--统计遗传分析导论
邓飞
2022/12/13
9460
一文搞定基因型数据清洗
plink软件初体验1--初试牛刀
准备写一系列plink软件常用的命令,最近在数据分析时,需要将基因型的数据转化为0-1-2的形式,编程实现效果太差,100万的数据,plink十几秒完成,真的是厉害,非常值得学习,所以,开始搞起!
邓飞
2020/11/03
1.3K0
多基因风险评分3
PRSice是当前比较流行的多基因风险评分工具,它主要是用R语言编写的,运行速度快,可以高通量处理大数据。它既有Linux版本,也有Windows版本,由于我们平时研究中使用Linux操作系统比较多,故本次主要以Linux版本为例进行讲解。如果有小伙伴想在Windows操作系统下安装并使用该软件,那么可以在PRSice官网(https://www.prsice.info/)上获取相关教程。
生信与临床
2020/08/27
1.7K0
多基因风险评分3
笔记 | GWAS 操作流程3:plink关联分析--完结篇
注意,这里我使用的是ped和map格式,如果ped文件中有表型数据(第六列),如果想指定表型数据,用--pheno,包括三列:家系,个体,表型值。
邓飞
2020/05/13
9.4K1
统计遗传学:第八章,基因型数据质控
大家好,我是飞哥,本章节是理论+实操,干货满满,这里我将书中的数据用代码进行了实现,你可以下载相关的数据,用我整理好的代码进行操作,666!
邓飞
2022/12/12
1.8K0
统计遗传学:第八章,基因型数据质控
统计遗传学:第九章,GWAS+群体分析+亲缘关系分析
本篇,使用数据和代码演示的形式,展示了GWAS分析、群体结构分析、亲缘关系分析三部分内容。我又重演了一遍,修正了一些bug。文中代码和数据我回头专门整理相关博文进行分享。
邓飞
2022/12/12
4.1K0
统计遗传学:第九章,GWAS+群体分析+亲缘关系分析
笔记 | GWAS 操作流程2-6:去掉亲缘关系近的个体
这是使用plink学习GWAS中质控的最后一篇,后面是使用GLM和MLM模型进行建模,以及对结果的整理和可视化。
邓飞
2020/05/13
2.9K1
GWAS_Flow:使用GPU加速全基因组关联分析
21世纪是生物的世纪,生物数据的增长速度越来越快。很多分析工具在开发时并没有考虑到大规模数据的应用场景。在数据量不大的时候,这些工具的计算时间并不会太长,可以让人接受。但在数据规模庞大时,可能就 hold 不住,等待时间让人发指。
实验盒
2021/09/22
1K0
GWAS_Flow:使用GPU加速全基因组关联分析
如何计算群体中的单倍型频率
昨天写了一篇(单倍型的显著性分析)的博文,里面介绍了为什么GWAS分析后,要进行单倍型的显著性分析,简而言之,如果显著性位点在block中,以block为代表进行利用,可以进行PRS(多基因评分)或者MAS(分子标记辅助选择。
邓飞
2025/04/04
2650
如何计算群体中的单倍型频率
笔记GWAS 操作流程2-3:MAF过滤
因为这里是人的数据,所以染色体只需要去1~22的常染色体,提取它的家系ID和个体ID,后面用于提取。
邓飞
2020/04/14
5.7K0
笔记GWAS 操作流程2-3:MAF过滤
用 LDSC 计算遗传度以及遗传相关性
•精神分裂症的 LD Score 回归截距•精神分裂症的 SNP 遗传度•精神分裂症和躁郁症之间的遗传相关性
生信菜鸟团
2020/07/29
12.6K1
用 LDSC 计算遗传度以及遗传相关性
相关推荐
不会安装使用PRSice-2软件就太不讲究了
更多 >
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档