ClipAndMergeはAlexander PeltzerさんがGithubで公開されている、アダプタートリミング、クオリティトリミング、ペアエンドのマージを一括して行ってくれるツール。ワンライナーでマージしたfastq出力を得ることができる。
インストール
mac os10.14のminiconda3-4.2.12環境でテストした。
本体 Github
#Anaconda環境でcondaを使い導入
conda install -c bioconda -y ClipAndMerge
> ClipAndMerge -h
$ ClipAndMerge -h
ClipAndMerge (v. 1.7.8)
Integrative Transcriptomics
University of Tübingen
Author: Günter Jäger
This tool clips adapters from fastq sequences and merges overlapping regions from forward and reverse reads.
Input sequences are accepted in fastq, or in gzipped fastq format.
Option "-in1" is required
java -jar ClipAndMerge.jar [options...]
Example: java -jar ClipAndMerge.jar -in1 STRING
-discardBadReads : Discard reads after merging that do not fulfill the quality criteria. (default:
false)
-e DOUBLE : Error rate for merging forward and reverse reads. A value of 0.05 means that 5%
mismatches are allowed in the overlap region. (default: 0.05)
-f FORWARD_ADAPTER_STRING : Forward reads adapter sequence. (default: AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC)
-h : Display this help page and exit. (default: true)
-in1 STRING : Forward reads input file(s) in fastq(.gz) file format.
-in2 STRING[] : Reverse reads input file(s) in fastq(.gz) file format.
-l INTEGER : Discard sequences shorter than this number of nucleotides after adapter
clipping. (default: 25)
-lastBase INTEGER : Reads are trimmed from the 3' end until given value is reached. Trimming is not
performed if read is already <= given value. If this option is given the
'-trim3p' option is disregarded! Given value sould be 1-based! (default:
2147483647)
-log LOG_FILE_STRING : Write log messages to a file instead of the standard error stream.
-m INTEGER : Require a minimum adapter alignment length. If less nucleotides align with the
adapter, the sequences are not clipped. (default: 8)
-maxParallelReads NUM_READS_INTEGER : Maximal number of reads, that can be processed in parallel. This number largely
depends on the processing system settings! Only change it if you know what you
are doing! (default: 1000)
-minQualBadReads INTEGER : Minimal base quality for keeping bad reads. If 0 is specified, then all reads
are kept. (default: 0)
-n : Discard sequences with unknown (N) nucleotides. Default is to keep such
sequences. (default: false)
-no_clip_stats : Disable the display of clipping statistics. (default: false)
-no_clipping : Skip adapter clipping. Only read merging is performed! (This is only recommended
if every forward and reverse read has a corresponding partner in the other
respective fastq-file! Otherwise merging can not be performed correctly.
(default: false)
-no_merging : Skip read merging for paired-end sequencing data! Only adapter clipping is
performed. This parameter is not needed for single-end data. (default: false)
-no_qbMM : Do not perform quality based mismatch calculation for merging. Default is to
take quality scores into account. (default: false)
-o OUTPUT_FILE_STRING : Output file. If no file is provided, output will be written to System.out. If
file ends with '.gz', output will be gzipped.
-p INTEGER : Minimal number of nucleotides that have to overlap in order to merge the forward
and reverse read. (default: 10)
-q INTEGER : Minimum base quality for quality trimming. (default: 20)
-qo INTEGER : Phred Score offset. (default: 33)
-qt : Enable quality trimming for non-merged reads. (default: true)
-qualFreqBadReads DOUBLE : Percentage of reads that have to fulfill minimal base quality criterion.
(default: 0.9)
-r REVERSE_ADAPTER_STRING : Reverse reads adapter sequence. (default: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA)
-rm_no_partner : Remove reads with no pairing partner after adapter clipping. (default: false)
-timeEstimation : Perform remaining time estimation. Note: this can take long for large gzipped
input files. (default: false)
-trim3p INTEGER : Trim N nucleotides from the 3' end of each read. This step is performed after
adapter clipping. Reverse reads are not reverse trancriped before trimming.
(default: 0)
-trim5p INTEGER : Trim N nucleotides from the 5' end of each read. This step is performed after
adapter clipping. Reverse reads are not reverse transcriped before trimming.
(default: 0)
-u FORWARD_FILE REVERSE_FILE : Write unmerged forward and reverse reads to extra files. Unmerged forward reads
are written to the file 'FORWARD_FILE'. Unmerged reverse reads are written to
the file 'REVERSE_FILE', i.e. the regular output file then only contains merged
reads!
Attention: If the option '-rm_no_partner' is not selected the two given output
files also contain forward/reverse reads with no pairing partner!
If filenames end with '.gz' gzipped output is produced!
-verbose : Print additional processing information (default: false)
——
実行方法
ペアエンドリードとアダプター配列を指定して実行する。
ClipAndMerge -verbose -l 25 -p 10 -q 20 -e 0.05\
-f AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC\
-r AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA\
-in1 pair_1.fastq -in2 pair_2.fastq\
-o merged.fq.gz
- -in1 Forward reads input file(s) in fastq(.gz) file format.
- -in2 Reverse reads input file(s) in fastq(.gz) file format.
- -o Output file
- -p Minimal number of nucleotides that have to overlap in order to merge the forward and reverse read. (default: 10)
- -q Minimum base quality for quality trimming. (default: 20)
- -e Error rate for merging forward and reverse reads. A value of 0.05 means that 5% mismatches are allowed in the overlap region. (default: 0.05)
- -l Discard sequences shorter than this number of nucleotides after adapter clipping. (default: 25)
- -verbose Print additional processing information (default: false)
引用
https://github.com/apeltzer/ClipAndMerge