CNVkit coverage命令使用BAM格式的测序reads比对和BED或interval list格式的on或off-target bins的位置,计算样本中每个bin中的log2平均读取深度。对于每个bin,使用pysam计算和求和bin中每个碱基对的read depths,然后除以bin的大小。输出是一个表,其中显示了每个给定bin的平均reads depths,经过log2变换并以所有常染色体centered to the median read depth of all autosomes。
Construction of a copy number reference
在每个基因组bin中,提取每个给定对照样本的read depths。对每个control样本执行reads深度偏差校正。在每个bin中,计算control samples中log2 reads深度的加权平均值,以标记上具有较高或较低覆盖率的bin,the spread or statistical dispersion of log2 read depths indicates bins that have erratic coverage so that they can be de-emphasized at the segmentation step. 也可以使用单个成对的control sample,或者,在没有任何control samples的情况下,可以构建一个“通用”参考,读取深度为log2,分配给所有箱子的扩展为0。在所有情况下,都可以指定一个“男性参考”,其中X染色体bin的预期读取深度是常染色体的一半。
对于混合捕获,如果targets are not tiled with uniform density——例如,target panel is designed with a subset of targets having twice or half the usual number of tiles for a fixed number of genomic bases ——不需要做任何特别的事情来弥补这一点,as long as you are using a pooled reference。当测试样本的读深度归一化到pool引用时,log2比率将趋于平衡.
Filtering segments
cn值,merging adjacent with the same called value.
Keeping only high-level amplifications (5 copies or more) and homozygous deletions (0 copies) (ampdel).
Confidence interval overlapping zero (ci).
Standard error of the mean (sem), a parametric estimate of confidence intervals which behaves similarly.
gene, chromosome – as in .cns, the gene where the breakpoint occurs and the chromosome it lies on.
location – the end of the segment to the left of the breakpoint, and start of the segment to the right.
change – the difference in log2 values between the adjacent segments.
probes_left, probes_right – the number of probes on each side of the breakpoint within the gene. (Not the same as the number of probes supporting each segment; just the portion within the gene.)
genemetrics
Identify targeted genes with copy number gain or loss above or below a threshold.
The remaining output columns have slightly different meaning depending on whether or not segments were provided. Without segments (.cnr alone):
log2: Weighted mean of log2 ratios of all the gene’s bins, including any off-target intronic bins.
depth: Weighted mean of un-normalized read depths across all this gene’s bins.
weight: Sum of this gene’s bins’ weights.
nbins: The number of bins assigned to this gene.
With segments (-s):
log2: The log2 ratio value of the segment covering the gene, i.e. weighted mean of all bins covered by the whole segment, not just this gene.
depth, weight, probes: As above.
seg_weight: The sum of the weights of the bins supporting the segment.
seg_probes: The number of probes supporting the segment.