macでインフォマティクス

macでインフォマティクス

NGS関連のインフォマティクス情報についてまとめています。

Roche 454のクオリティトリミングツール QTrim

 

QTrimは454のトリミングツール。PRINSEQと同等のパフォーマンスを持つとされる。

  

公式HP

http://hiv.sanbi.ac.za/software/qtrim#Installation

webサーバー

http://hiv.sanbi.ac.za/tools/#/qtrim

 

インストール

公式HPから実行可能なバイナリと454のテストデータQTrimTestData.tarがダウンロードできる。

> ./QTrim_v1_1 -h

 user$ QTrim 

*****************************************

LICENSING:

 

 

QTrim 

 

Copyright (c) 2013, QTrim Development Team (QDT)

 

 

QTrim is freely available for use for non-commercial users and there is no restriction for academic use of QTrim.  Commercial use may be restricted and such users should contact Prof Simon Travers for further details (simon@sanbi.ac.za).

 

All rights reserved.

 

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

 

 

The software listed below is called by QTrim and is bundled in the executable file to facilitate easy usage and installation of QTrim.  The QTrim development team have in no way modified any aspect of the softwares listed below.

 

Matplotlib (Copyright (c) 2012-2013 Matplotlib Development Team; All Rights Reserved) is distributed under the BSD license with licensing information available at at http://matplotlib.org/users/license.html.  Matplotlib is available at www.matplotlib.org

 

Numpy (Copyright (c) 2005, NumPy Developers) is distributed under BSD license (http://docs.scipy.org/doc/numpy/license.html). Numpy is available at: www.numpy.org

 

Biopython is available under GNU free license 1.2  (http://www.biopython.org/DIST/LICENSE). Biopython is available at: www.biopython.org

 

*****************************************

QTrim: Highly sensitive 454 pyrosequence Quality Trimming tool

QTrim Version: 1.1

 

Required options:

Input File: -fastq fastqfile OR both -fasta fastafile -qual qualityfile

 

Other options:

 

Output file: -o [Default filename: Outputfile]

Mean quality: -m [INT] Range: 0-40 Default: 20

Minimum read length: -l [INT] Default: 50

Mode: -mode [1,2,3,4] Default: 2 

remove keys: -rk [INT]

Verbose: -verbose

WindowSize: -w [INT] [Default is mininum length]

Output file format: -out_format [Output file format: 1) Fastq file with INT quality score 2) Fastq file with ASCII quality score 3) Sequence and Quality in different Fasta file with Base name provided in output filename]

Sequence statistics in id: -seq_id_stat

Analytical plotting: -plot plot_format (supports these formats:eps, pdf, svg, svgz )

 

Example command:

QTrim_v1_1 -fastq myfastqfile #Runs with all default values

QTrim_v1_1 -verbose -fasta fastafile -qual qualityfile -l 10 -m 30 -o outputfilename -mode 2 -out_format 2

OR

/Fullpath/to/QTrim_v1_1 -fastq fastqfile -l 50 -m 30 -o outputfilename -mode 3 -out_format 3 -seq_id_stat -plot pdf

パスの通ったディレクトリに移動しておく。またはリンクを張る。

 

実行方法

テストデータを指定してラン。

QTrim -fastq Poor_quality_dataset.fastq -o output -plot pdf -out_format 2 
  • -fastq fastq file that contains both sequence data and quality scores. Quality scores should be in PHRED format.
  • -o Output filename.
  • -out_format Output file format Options: 1: fastq format with sequence quality scores in integer value. 2: fastq format with sequence quality scores in ASCII characters. 3. separate sequence (fasta) and quality (.qual files) with quality scores in integer values (default2).
  • -plot If this option is invoked QTrim will produce a number of plots of the statistics associated with the trimming (see below for further details). Available output formats are: eps, pdf, svg, svgz. If this option is not invoked trimming will continue without outputting graphs
  • -v Prints a verbose output to the screen while processing and trimming sequence reads.

トリミングしたfastqや、リード数、統計情報などが出力される。

statistics

user$ head Outputfile_stat.txt 

Total reads input: 33022

Total reads output: 32835

Maximum read length in output: 479

Minimum read length in output: 50

Mean read length in output: 274

 

-plotをつけると、fastq以外にもいくつか図が出力される。

Before 

f:id:kazumaxneo:20171222210735j:plain

After

f:id:kazumaxneo:20171222210741j:plain

Before 

f:id:kazumaxneo:20171222210825j:plain

After

f:id:kazumaxneo:20171222210836j:plain

Before 

f:id:kazumaxneo:20171222210851j:plain

After

f:id:kazumaxneo:20171222210912j:plain

 

 

引用

QTrim: a novel tool for the quality trimming of sequence reads generated using the Roche/454 sequencing platform

Shrestha RK, Lubinsky B, Bansode VB, Moinz MB, McCormack GP, Travers SA

BMC Bioinformatics. 2014 Jan 30;15:33.