アダプタートリミング、クオリティトリミング、ペアエンドのマージを一括して行う ClipAndMerge

ClipAndMergeはAlexander PeltzerさんがGithubで公開されている、アダプタートリミング、クオリティトリミング、ペアエンドのマージを一括して行ってくれるツール。ワンライナーでマージしたfastq出力を得ることができる。

インストール

mac os10.14のminiconda3-4.2.12環境でテストした。

本体　Github

#Anaconda環境でcondaを使い導入
conda install -c bioconda -y ClipAndMerge

> ClipAndMerge -h

$ ClipAndMerge -h

ClipAndMerge (v. 1.7.8)

Integrative Transcriptomics

University of Tübingen

Author: Günter Jäger

This tool clips adapters from fastq sequences and merges overlapping regions from forward and reverse reads.

Input sequences are accepted in fastq, or in gzipped fastq format.

Option "-in1" is required

java -jar ClipAndMerge.jar [options...]

Example: java -jar ClipAndMerge.jar -in1 STRING

-discardBadReads : Discard reads after merging that do not fulfill the quality criteria. (default:

false)

-e DOUBLE : Error rate for merging forward and reverse reads. A value of 0.05 means that 5%

mismatches are allowed in the overlap region. (default: 0.05)

-f FORWARD_ADAPTER_STRING : Forward reads adapter sequence. (default: AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC)

-h : Display this help page and exit. (default: true)

-in1 STRING : Forward reads input file(s) in fastq(.gz) file format.

-in2 STRING[] : Reverse reads input file(s) in fastq(.gz) file format.

-l INTEGER : Discard sequences shorter than this number of nucleotides after adapter

clipping. (default: 25)

-lastBase INTEGER : Reads are trimmed from the 3' end until given value is reached. Trimming is not

performed if read is already <= given value. If this option is given the

'-trim3p' option is disregarded! Given value sould be 1-based! (default:

2147483647)

-log LOG_FILE_STRING : Write log messages to a file instead of the standard error stream.

-m INTEGER : Require a minimum adapter alignment length. If less nucleotides align with the

adapter, the sequences are not clipped. (default: 8)

-maxParallelReads NUM_READS_INTEGER : Maximal number of reads, that can be processed in parallel. This number largely

depends on the processing system settings! Only change it if you know what you

are doing! (default: 1000)

-minQualBadReads INTEGER : Minimal base quality for keeping bad reads. If 0 is specified, then all reads

are kept. (default: 0)

-n : Discard sequences with unknown (N) nucleotides. Default is to keep such

sequences. (default: false)

-no_clip_stats : Disable the display of clipping statistics. (default: false)

-no_clipping : Skip adapter clipping. Only read merging is performed! (This is only recommended

if every forward and reverse read has a corresponding partner in the other

respective fastq-file! Otherwise merging can not be performed correctly.

(default: false)

-no_merging : Skip read merging for paired-end sequencing data! Only adapter clipping is

performed. This parameter is not needed for single-end data. (default: false)

-no_qbMM : Do not perform quality based mismatch calculation for merging. Default is to

take quality scores into account. (default: false)

-o OUTPUT_FILE_STRING : Output file. If no file is provided, output will be written to System.out. If

file ends with '.gz', output will be gzipped.

-p INTEGER : Minimal number of nucleotides that have to overlap in order to merge the forward

and reverse read. (default: 10)

-q INTEGER : Minimum base quality for quality trimming. (default: 20)

-qo INTEGER : Phred Score offset. (default: 33)

-qt : Enable quality trimming for non-merged reads. (default: true)

-qualFreqBadReads DOUBLE : Percentage of reads that have to fulfill minimal base quality criterion.

(default: 0.9)

-r REVERSE_ADAPTER_STRING : Reverse reads adapter sequence. (default: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA)

-rm_no_partner : Remove reads with no pairing partner after adapter clipping. (default: false)

-timeEstimation : Perform remaining time estimation. Note: this can take long for large gzipped

input files. (default: false)

-trim3p INTEGER : Trim N nucleotides from the 3' end of each read. This step is performed after

adapter clipping. Reverse reads are not reverse trancriped before trimming.

(default: 0)

-trim5p INTEGER : Trim N nucleotides from the 5' end of each read. This step is performed after

adapter clipping. Reverse reads are not reverse transcriped before trimming.

(default: 0)

-u FORWARD_FILE REVERSE_FILE : Write unmerged forward and reverse reads to extra files. Unmerged forward reads

are written to the file 'FORWARD_FILE'. Unmerged reverse reads are written to

the file 'REVERSE_FILE', i.e. the regular output file then only contains merged

reads!

Attention: If the option '-rm_no_partner' is not selected the two given output

files also contain forward/reverse reads with no pairing partner!

If filenames end with '.gz' gzipped output is produced!

-verbose : Print additional processing information (default: false)

——

実行方法

ペアエンドリードとアダプター配列を指定して実行する。

ClipAndMerge -verbose -l 25 -p 10 -q 20 -e 0.05\
 -f AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC\
 -r AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA\
 -in1 pair_1.fastq -in2 pair_2.fastq\
 -o merged.fq.gz

-in1 Forward reads input file(s) in fastq(.gz) file format.
-in2 Reverse reads input file(s) in fastq(.gz) file format.
-o Output file
-p Minimal number of nucleotides that have to overlap in order to merge the forward and reverse read. (default: 10)
-q Minimum base quality for quality trimming. (default: 20)
-e Error rate for merging forward and reverse reads. A value of 0.05 means that 5% mismatches are allowed in the overlap region. (default: 0.05)
-l Discard sequences shorter than this number of nucleotides after adapter clipping. (default: 25)
-verbose Print additional processing information (default: false)