几点说明
1.非简单翻译,所有代码均可运行,为了辅助理解,基本每步代码都有结果,需要比较的进行了整合
2.原文中的软件都下载最新版本
3.原文中有少量代码是错误的,这里进行了修正
4.对于需要的一些知识背景,在这里进行了注释或链接到他人博客
一共4部分
上面已经得到的mutation files
需要用到前面的内容
cd src
curl -OL http://sourceforge.net/projects/samtools/files/samtools/1.9/bcftools-1.9.tar.bz2
tar jxvf bcftools-1.9.tar.bz2
cd bcftools-1.9
make
ln -s ~/src/bcftools-1.9/bcftools ~/bin
每个run的平均coverage是什么
默认,samtools depth只输出非0 coverage的区域
samtools depth bwa.bam | awk '{ sum += $3 } END { print sum/NR } '
73.8133
以下内容部分和前面重复
samtools view -h bow.bam |head| samtools view -h bow.bam |head|grep @SQ
@SQ SN:NC_002549 LN:18959
samtools depth bwa.bam | awk '{ sum += $3 } END { print sum/18959 } '
结果仍然是
73.7938
前已述及,不再展示图
samtools mpileup -Q 0 bwa.bam | more
samtools mpileup -f ~/refs/852/NC.fa -Q 0 bwa.bam | more
samtools tview bwa.bam
1 11 21 31 41 51 61 71
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
CACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAATTTTCCTCTC
CACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAATTTTCCTCTG
CACAAAAAGAAAGAAGAATTTTTAGGATCTTTAGTGTGCGAATAACTATGAGGAATATTAATAATTTACCTCTC
AAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAATTTTCCTCTC
AAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAATTTTCCTCTC
AAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAATTTTCCTCTC
GAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAAGTATGAGGAAGATTAATAATTTTCCTCTC
AGAAGAATTTTTAGGATCTTTTGTGTTCGAATAACTATGAGGAAGATTAATAATTTTCCTCTC
AAGAATTTTTAGGATCTTTTGTGTGGGAATAACTATGAGGAAGATTCAAAATTTTCCTCTC
AGAATTTTTAGTATCTTTTGAGTGCGACTAACTATGAGGAAGATTAATAATTTTCCTCTC
AATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAATTTTCCTCAC
AATTTTTAGGATCTTTTGTGTGCGAATCACTATGAGGAAGATTAATAATTTTCCTCTC
ATTATTAGGATCTTTTGTGTGCGAATAACTATGAGGACGATTAATAATTTTCCTCTC
TTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATACTTTTCCTCTC
TTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAATTTTCCTCTC
TTAGGATCTTTTGTGTGCGAATAACTATGATGAAGATTAATAATTTTCCTCTC
AGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAATTTTCCTCTC
TATTGTGTGCGAATAACTATGAGGAAGATTAATAATTTTCCTCTC
TTTTGTGTGCGAATAAGTATGAGGAAGATTAATAATTTTCCTCTC
TTTGTGTGCGAATAACTATGAGGAAGATTAATCATTTTCCTCTC
ATGTGTGCGAATAACTATGAGGAAGATTAATAATTTTCCTCTC
GTGCGAATAAGTATGCGGCAGATTAATAATTTTCCTCTC
TAACTATGAGGAAGATTAATAATTTTCCTCTC
TAACTATGAGGAAGATTAATAATTTTCCTCTC
CTATGAGGAAGATTAATAATTTTGCTCTC
ATGAGGAAGATTAATAATTTTCCTCTC
TGCGGAAGATTAATAATTTTCCTCTC
AGGAAGATAAATAATTTTCCTCTC
AGATTAATAATTTTCCTCTC
AGATTAATAATTTTCCTCTC
TTAATAATTTTCCTCTC
ATTTTCCTCTC
TTTTCCTCTC
TTTTCCTCTC
TTTCCTCTC
TCTC
CTG
C
samtools mpileup -Q 0 -f ~/refs/852/NC.fa -uv bwa.bam | more
部分结果如下
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT liver
NC_002549 5 . C <*> 0 . DP=1;I16=1,0,0,0,17,289,0,0,60,3600,0,0,0,0,0,0;QS=1,0;MQ0F=0 PL 0,3,17
NC_002549 6 . A <*> 0 . DP=1;I16=1,0,0,0,17,289,0,0,60,3600,0,0,1,1,0,0;QS=1,0;MQ0F=0 PL 0,3,17
NC_002549 7 . C <*> 0 . DP=2;I16=2,0,0,0,34,578,0,0,120,7200,0,0,2,4,0,0;QS=1,0;MQ0F=0 PL 0,6,31
NC_002549 8 . A <*> 0 . DP=2;I16=2,0,0,0,34,578,0,0,120,7200,0,0,4,10,0,0;QS=1,0;MQ0F=0 PL 0,6,31
NC_002549 9 . C <*> 0 . DP=2;I16=2,0,0,0,34,578,0,0,120,7200,0,0,6,20,0,0;QS=1,0;MQ0F=0 PL 0,6,31
NC_002549 10 . A <*> 0 . DP=2;I16=2,0,0,0,34,578,0,0,120,7200,0,0,8,34,0,0;QS=1,0;MQ0F=0 PL 0,6,31
NC_002549 11 . A <*> 0 . DP=3;I16=3,0,0,0,51,867,0,0,180,10800,0,0,10,52,0,0;QS=1,0;MQ0F=0 PL 0,9,42
NC_002549 12 . A <*> 0 . DP=4;I16=4,0,0,0,68,1156,0,0,240,14400,0,0,13,75,0,0;QS=1,0;MQ0F=0 PL 0,12,50
NC_002549 13 . A <*> 0 . DP=5;I16=5,0,0,0,85,1445,0,0,300,18000,0,0,17,105,0,0;QS=1,0;MQ0F=0 PL 0,15,57
NC_002549 14 . A <*> 0 . DP=5;I16=5,0,0,0,85,1445,0,0,300,18000,0,0,22,144,0,0;QS=1,0;MQ0F=0 PL 0,15,57
NC_002549 15 . G <*> 0 . DP=6;I16=6,0,0,0,102,1734,0,0,360,21600,0,0,27,193,0,0;QS=1,0;MQ0F=0 PL 0,18,62
NC_002549 16 . A <*> 0 . DP=6;I16=6,0,0,0,102,1734,0,0,360,21600,0,0,33,253,0,0;QS=1,0;MQ0F=0 PL 0,18,62
NC_002549 17 . A <*> 0 . DP=6;I16=6,0,0,0,102,1734,0,0,360,21600,0,0,39,325,0,0;QS=1,0;MQ0F=0 PL 0,18,62
NC_002549 18 . A <*> 0 . DP=7;I16=7,0,0,0,119,2023,0,0,420,25200,0,0,45,409,0,0;QS=1,0;MQ0F=0 PL 0,21,66
NC_002549 19 . G <*> 0 . DP=7;I16=7,0,0,0,119,2023,0,0,420,25200,0,0,52,506,0,0;QS=1,0;MQ0F=0 PL 0,21,66
NC_002549 20 . A <*> 0 . DP=8;I16=8,0,0,0,136,2312,0,0,480,28800,0,0,59,617,0,0;QS=1,0;MQ0F=0 PL 0,24,69
NC_002549 21 . A <*> 0 . DP=9;I16=9,0,0,0,153,2601,0,0,540,32400,0,0,67,743,0,0;QS=1,0;MQ0F=0 PL 0,27,72
samtools mpileup -Q 0 -ugf ~/refs/852/NC.fa bwa.bam | bcftools call -vc > samtools.vcf
查看snp calls,知道snp calling的原理
这里原作者写了一个python脚本名字为snpcaller.py.但我还没找到
代码先放这里,但不影响后续分析。做个
cat bwa.pileup | python snpcaller.py > diy.vcf
cat samtools.vcf |grep -v "##"|cut -f 1-5|head
#CHROM POS ID REF ALT
NC_002549 425 . G T,A
NC_002549 507 . G T
NC_002549 2846 . a aT
NC_002549 2847 . g gT
NC_002549 7894 . A C
NC_002549 9569 . G A
NC_002549 12101 . T A
NC_002549 12957 . G A
NC_002549 14086 . A G