前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >基因组实战01: introduction

基因组实战01: introduction

原创
作者头像
生信探索
发布2023-03-25 11:56:45
2260
发布2023-03-25 11:56:45
举报
文章被收录于专栏:生信探索

1.What is GATK?

GATK stands for Genome Analysis Toolkit. It is a collection of command-line tools for analyzing high-throughput sequencing data with a primary focus on variant discovery. The tools can be used individually or chained together into complete workflows. We provide end-to-end workflows, called GATK Best Practices, tailored for specific use cases.

Starting with version 4.0, GATK contains a copy of the Picard toolkit, so all Picard tools are available from within GATK itself.

2.Analysis phases

(1) Data pre-processing is the first phase in all cases, and involves pre-processing the raw sequence data (provided in FASTQ or uBAM format) to produce analysis-ready BAM files. This involves alignment to a reference genome as well as some data cleanup operations to correct for technical biases and make the data suitable for analysis.

(2) Variant discovery proceeds from analysis-ready BAM files and produces variant calls. This involves identifying genomic variation in one or more individuals and applying filtering methods appropriate to the experimental design. The output is typically in VCF format although some classes of variants (such as CNVs) are difficult to represent in VCF and may therefore be represented in other structured text-based formats.

(3) Additional steps such as filtering and annotation may be required to produce a callset ready for downstream genetic analysis, depending on the application. This typically involves using resources of known variation, truthsets and other metadata to assess and improve the accuracy of the results as well as attach additional information.

3. Clinical Whole Genome Sequencing Workflow

4. Experimental designs

Strategy

Panel

Exome(WES)

Genome(WGS)

Size of target space (Mbp)

~ 0.5

~ 50

~ 3200

Average read depth

500–1000×

100–150×

~ 30–60×

Relative cost

$

$$

$$$

SNV/indel detection

++

++

++

CNV detection

+

+

++

SV detection

+

Low VAF

++

+

+

Reference

代码语言:text
复制
https://gatk.broadinstitute.org/hc/en-us/sections/360007226651-Best-Practices-Workflows
https://www.nature.com/articles/s41525-022-00295-z
https://doi.org/10.1007/s00441-017-2636-6
https://genomemedicine.biomedcentral.com/counter/pdf/10.1186/s13073-020-00791-w.pdf
https://mp.weixin.qq.com/s/8bux7uTeZC5a23yVgExLIw

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1.What is GATK?
  • 2.Analysis phases
  • 3. Clinical Whole Genome Sequencing Workflow
  • 4. Experimental designs
  • Reference
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档