这部分内容本来可以放在qc部分,但太耗费时间了,故单独出来,但不列入总体编号。
所以拿一个样本测试了下,下面是完整的报告
(wes) pc@lab-pc:/project/raw_fq$ trim_galore -q 25 --phred33 --length 36 -e 0.1 --stringency 3 --paired -o ./try SRR8707702_1.fastq.gz SRR8707702_2.fastq.gz
Multicore support not enabled. Proceeding with single-core trimming.
Path to Cutadapt set as: 'cutadapt' (default)
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Cutadapt version: 1.18
single-core operation.
Output will be written into the directory: /project/raw_fq/try/
AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> SRR8707702_1.fastq.gz <<)
Found perfect matches for the following adapter sequences:
Adapter type Count Sequence Sequences analysed Percentage
Illumina 161502 AGATCGGAAGAGC 1000000 16.15
Nextera 9 CTGTCTCTTATA 1000000 0.00
smallRNA 4 TGGAATTCTCGG 1000000 0.00
Using Illumina adapter for trimming (count: 161502). Second best hit was Nextera (count: 9)
Writing report to '/project/raw_fq/try/SRR8707702_1.fastq.gz_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: SRR8707702_1.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.2
Cutadapt version: 1.18
Number of cores used for trimming: 1
Quality Phred score cutoff: 25
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 3 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 36 bp
Output file(s) will be GZIP compressed
Cutadapt seems to be reasonably up-to-date. Setting -j 1
Writing final adapter and quality trimmed output to SRR8707702_1_trimmed.fq.gz
>>> Now performing quality (cutoff '-q 25') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file SRR8707702_1.fastq.gz <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
This is cutadapt 1.18 with Python 2.7.16
Command line parameters: -j 1 -e 0.1 -q 25 -O 3 -a AGATCGGAAGAGC SRR8707702_1.fastq.gz
Processing reads on 1 core in single-end mode ...
Finished in 943.31 s (22 us/read; 2.70 M reads/minute).
=== Summary ===
Total reads processed: 42,376,519
Reads with adapters: 10,163,308 (24.0%)
Reads written (passing filters): 42,376,519 (100.0%)
Total basepairs processed: 6,356,477,850 bp
Quality-trimmed: 21,437,764 bp (0.3%)
Total written (filtered): 6,060,614,857 bp (95.3%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 10163308 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 18.0%
C: 28.4%
G: 28.2%
T: 25.3%
none/other: 0.0%
Overview of removed sequences
length count expect max.err error counts
3 893337 662133.1 0 893337
4 347466 165533.3 0 347466
5 246231 41383.3 0 246231
6 219213 10345.8 0 219213
7 215812 2586.5 0 215812
8 207139 646.6 0 207139
9 207281 161.7 0 206446 835
10 210473 40.4 1 203701 6772
11 203354 10.1 1 196321 7033
12 202462 2.5 1 195635 6827
13 203145 0.6 1 195859 7286
14 199398 0.6 1 191421 7977
15 196562 0.6 1 188247 8315
16 194391 0.6 1 186649 7742
17 197618 0.6 1 189390 8228
18 193231 0.6 1 186187 7044
19 186830 0.6 1 179677 7153
20 183871 0.6 1 176979 6892
21 186449 0.6 1 179368 7081
22 181859 0.6 1 175869 5990
23 174628 0.6 1 169266 5362
24 174634 0.6 1 169234 5400
25 173929 0.6 1 168479 5450
26 167869 0.6 1 162906 4963
27 166856 0.6 1 161982 4874
28 164736 0.6 1 160162 4574
29 161764 0.6 1 157387 4377
30 159685 0.6 1 155414 4271
31 151682 0.6 1 147797 3885
32 150497 0.6 1 146712 3785
33 146601 0.6 1 142975 3626
34 148207 0.6 1 144295 3912
35 145099 0.6 1 141289 3810
36 140649 0.6 1 136941 3708
37 136431 0.6 1 132871 3560
38 133370 0.6 1 129846 3524
39 132927 0.6 1 129325 3602
40 128969 0.6 1 125669 3300
41 130248 0.6 1 126709 3539
42 123247 0.6 1 120046 3201
43 123939 0.6 1 120738 3201
44 113487 0.6 1 110809 2678
45 157959 0.6 1 154315 3644
46 73403 0.6 1 71690 1713
47 89399 0.6 1 87311 2088
48 97120 0.6 1 94878 2242
49 96707 0.6 1 94398 2309
50 93432 0.6 1 91317 2115
51 92226 0.6 1 90028 2198
52 85858 0.6 1 83891 1967
53 84809 0.6 1 82872 1937
54 81733 0.6 1 79742 1991
55 81465 0.6 1 79537 1928
56 70456 0.6 1 68901 1555
57 71444 0.6 1 69799 1645
58 66980 0.6 1 65412 1568
59 62219 0.6 1 60844 1375
60 59960 0.6 1 58641 1319
61 56227 0.6 1 54820 1407
62 52534 0.6 1 51334 1200
63 53419 0.6 1 52102 1317
64 48620 0.6 1 47400 1220
65 42972 0.6 1 41949 1023
66 44404 0.6 1 43376 1028
67 38178 0.6 1 37247 931
68 35480 0.6 1 34649 831
69 34346 0.6 1 33508 838
70 33181 0.6 1 32273 908
71 36212 0.6 1 32873 3339
72 164077 0.6 1 161597 2480
73 7643 0.6 1 7147 496
74 1612 0.6 1 1497 115
75 1340 0.6 1 1241 99
76 1371 0.6 1 1278 93
77 1582 0.6 1 1481 101
78 1668 0.6 1 1574 94
79 1759 0.6 1 1657 102
80 1777 0.6 1 1656 121
81 1705 0.6 1 1583 122
82 1466 0.6 1 1365 101
83 1247 0.6 1 1182 65
84 1060 0.6 1 988 72
85 823 0.6 1 768 55
86 658 0.6 1 598 60
87 600 0.6 1 526 74
88 538 0.6 1 470 68
89 397 0.6 1 345 52
90 404 0.6 1 333 71
91 373 0.6 1 306 67
92 308 0.6 1 249 59
93 247 0.6 1 194 53
94 179 0.6 1 136 43
95 214 0.6 1 151 63
96 181 0.6 1 135 46
97 153 0.6 1 106 47
98 127 0.6 1 78 49
99 125 0.6 1 82 43
100 111 0.6 1 69 42
101 127 0.6 1 70 57
102 114 0.6 1 54 60
103 97 0.6 1 54 43
104 117 0.6 1 44 73
105 94 0.6 1 45 49
106 90 0.6 1 45 45
107 86 0.6 1 39 47
108 72 0.6 1 38 34
109 82 0.6 1 44 38
110 87 0.6 1 42 45
111 71 0.6 1 31 40
112 57 0.6 1 25 32
113 77 0.6 1 23 54
114 65 0.6 1 25 40
115 79 0.6 1 33 46
116 73 0.6 1 24 49
117 70 0.6 1 23 47
118 69 0.6 1 25 44
119 49 0.6 1 16 33
120 59 0.6 1 23 36
121 51 0.6 1 17 34
122 60 0.6 1 23 37
123 56 0.6 1 12 44
124 49 0.6 1 14 35
125 56 0.6 1 14 42
126 54 0.6 1 20 34
127 51 0.6 1 13 38
128 44 0.6 1 10 34
129 37 0.6 1 6 31
130 52 0.6 1 10 42
131 35 0.6 1 6 29
132 38 0.6 1 3 35
133 25 0.6 1 0 25
134 38 0.6 1 3 35
135 33 0.6 1 4 29
136 36 0.6 1 2 34
137 34 0.6 1 3 31
138 39 0.6 1 3 36
139 45 0.6 1 2 43
140 34 0.6 1 1 33
141 31 0.6 1 0 31
142 35 0.6 1 2 33
143 33 0.6 1 0 33
144 31 0.6 1 0 31
145 55 0.6 1 0 55
146 36 0.6 1 1 35
147 26 0.6 1 0 26
148 56 0.6 1 1 55
149 92 0.6 1 0 92
150 347 0.6 1 0 347
RUN STATISTICS FOR INPUT FILE: SRR8707702_1.fastq.gz
=============================================
42376519 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)
Writing report to '/project/raw_fq/try/SRR8707702_2.fastq.gz_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: SRR8707702_2.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.2
Cutadapt version: 1.18
Number of cores used for trimming: 1
Quality Phred score cutoff: 25
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 3 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 36 bp
Output file(s) will be GZIP compressed
Cutadapt seems to be reasonably up-to-date. Setting -j -j 1
Writing final adapter and quality trimmed output to SRR8707702_2_trimmed.fq.gz
>>> Now performing quality (cutoff '-q 25') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file SRR8707702_2.fastq.gz <<<
^[[A^[[B10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
^[[B40000000 sequences processed
This is cutadapt 1.18 with Python 2.7.16
Command line parameters: -j 1 -e 0.1 -q 25 -O 3 -a AGATCGGAAGAGC SRR8707702_2.fastq.gz
Processing reads on 1 core in single-end mode ...
Finished in 999.16 s (24 us/read; 2.54 M reads/minute).
=== Summary ===
Total reads processed: 42,376,519
Reads with adapters: 10,033,608 (23.7%)
Reads written (passing filters): 42,376,519 (100.0%)
Total basepairs processed: 6,356,477,850 bp
Quality-trimmed: 103,550,161 bp (1.6%)
Total written (filtered): 5,984,283,082 bp (94.1%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 10033608 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 18.7%
C: 29.1%
G: 27.4%
T: 24.8%
none/other: 0.0%
Overview of removed sequences
length count expect max.err error counts
3 883945 662133.1 0 883945
4 342881 165533.3 0 342881
5 250071 41383.3 0 250071
6 218735 10345.8 0 218735
7 210577 2586.5 0 210577
8 203032 646.6 0 203032
9 206182 161.7 0 203571 2611
10 210507 40.4 1 200692 9815
11 202270 10.1 1 191889 10381
12 201597 2.5 1 189934 11663
13 205749 0.6 1 192641 13108
14 221282 0.6 1 207596 13686
15 178072 0.6 1 167615 10457
16 194599 0.6 1 181966 12633
17 202540 0.6 1 190530 12010
18 169645 0.6 1 159683 9962
19 189212 0.6 1 178213 10999
20 173233 0.6 1 163468 9765
21 174462 0.6 1 164416 10046
22 176378 0.6 1 166218 10160
23 175789 0.6 1 165473 10316
24 184136 0.6 1 173185 10951
25 160030 0.6 1 151040 8990
26 163402 0.6 1 154332 9070
27 161455 0.6 1 152661 8794
28 163063 0.6 1 154670 8393
29 155863 0.6 1 147789 8074
30 163824 0.6 1 155172 8652
31 144071 0.6 1 136768 7303
32 147455 0.6 1 139784 7671
33 147065 0.6 1 139388 7677
34 147142 0.6 1 139737 7405
35 140542 0.6 1 133538 7004
36 137662 0.6 1 130722 6940
37 135790 0.6 1 129184 6606
38 132764 0.6 1 126241 6523
39 128553 0.6 1 122466 6087
40 129045 0.6 1 122642 6403
41 126068 0.6 1 119859 6209
42 121518 0.6 1 115237 6281
43 113109 0.6 1 107527 5582
44 116976 0.6 1 111295 5681
45 107814 0.6 1 102651 5163
46 105288 0.6 1 100548 4740
47 102650 0.6 1 97588 5062
48 103068 0.6 1 97886 5182
49 96017 0.6 1 90973 5044
50 95358 0.6 1 89955 5403
51 101937 0.6 1 97007 4930
52 72139 0.6 1 68343 3796
53 82879 0.6 1 78844 4035
54 74680 0.6 1 70731 3949
55 77624 0.6 1 73672 3952
56 73463 0.6 1 69959 3504
57 71484 0.6 1 67933 3551
58 67679 0.6 1 64415 3264
59 65277 0.6 1 62260 3017
60 62316 0.6 1 59174 3142
61 59439 0.6 1 56681 2758
62 58942 0.6 1 56276 2666
63 72060 0.6 1 61639 10421
64 400781 0.6 1 391238 9543
65 16165 0.6 1 15398 767
66 3813 0.6 1 3554 259
67 3171 0.6 1 2963 208
68 3216 0.6 1 3023 193
69 3183 0.6 1 2930 253
70 3233 0.6 1 3023 210
71 3436 0.6 1 3213 223
72 4087 0.6 1 3831 256
73 3774 0.6 1 3526 248
74 3963 0.6 1 3711 252
75 3115 0.6 1 2911 204
76 2630 0.6 1 2442 188
77 2229 0.6 1 2052 177
78 1622 0.6 1 1483 139
79 1344 0.6 1 1198 146
80 1222 0.6 1 1081 141
81 1021 0.6 1 899 122
82 934 0.6 1 819 115
83 847 0.6 1 754 93
84 679 0.6 1 581 98
85 594 0.6 1 492 102
86 500 0.6 1 410 90
87 478 0.6 1 383 95
88 407 0.6 1 310 97
89 362 0.6 1 270 92
90 332 0.6 1 254 78
91 285 0.6 1 203 82
92 274 0.6 1 195 79
93 251 0.6 1 175 76
94 207 0.6 1 121 86
95 187 0.6 1 110 77
96 215 0.6 1 122 93
97 193 0.6 1 116 77
98 163 0.6 1 95 68
99 151 0.6 1 83 68
100 139 0.6 1 70 69
101 172 0.6 1 91 81
102 135 0.6 1 78 57
103 120 0.6 1 51 69
104 117 0.6 1 69 48
105 128 0.6 1 63 65
106 117 0.6 1 57 60
107 106 0.6 1 48 58
108 115 0.6 1 47 68
109 112 0.6 1 65 47
110 105 0.6 1 44 61
111 108 0.6 1 46 62
112 96 0.6 1 31 65
113 98 0.6 1 43 55
114 101 0.6 1 37 64
115 90 0.6 1 36 54
116 88 0.6 1 38 50
117 84 0.6 1 38 46
118 87 0.6 1 30 57
119 97 0.6 1 35 62
120 74 0.6 1 32 42
121 113 0.6 1 53 60
122 73 0.6 1 26 47
123 79 0.6 1 29 50
124 67 0.6 1 30 37
125 66 0.6 1 22 44
126 67 0.6 1 16 51
127 65 0.6 1 13 52
128 44 0.6 1 9 35
129 48 0.6 1 10 38
130 45 0.6 1 13 32
131 36 0.6 1 6 30
132 50 0.6 1 4 46
133 49 0.6 1 3 46
134 41 0.6 1 2 39
135 49 0.6 1 1 48
136 54 0.6 1 1 53
137 53 0.6 1 4 49
138 60 0.6 1 2 58
139 41 0.6 1 1 40
140 46 0.6 1 0 46
141 37 0.6 1 0 37
142 49 0.6 1 1 48
143 39 0.6 1 0 39
144 58 0.6 1 0 58
145 58 0.6 1 0 58
146 59 0.6 1 0 59
147 53 0.6 1 0 53
148 62 0.6 1 0 62
149 82 0.6 1 0 82
150 207 0.6 1 2 205
RUN STATISTICS FOR INPUT FILE: SRR8707702_2.fastq.gz
=============================================
42376519 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)
Validate paired-end files SRR8707702_1_trimmed.fq.gz and SRR8707702_2_trimmed.fq.gz
file_1: SRR8707702_1_trimmed.fq.gz, file_2: SRR8707702_2_trimmed.fq.gz
>>>>> Now validing the length of the 2 paired-end infiles: SRR8707702_1_trimmed.fq.gz and SRR8707702_2_trimmed.fq.gz <<<<<
Writing validated paired-end read 1 reads to SRR8707702_1_val_1.fq.gz
Writing validated paired-end read 2 reads to SRR8707702_2_val_2.fq.gz