macでインフォマティクス

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

クオリティトリミングツール sickle

2020 10/31 インストール追記

2020 11/24 help追記

2021 6/15 コマンド追記

 

sickleはfastqのクオリティトリミングツール。リード長の0.1倍のウィンドウサイズでリードを分析し、指定値以下のクオリティになった領域をトリムする。Trimmomaticと同様、ペアリードの順番が破壊されないよう、ペアの数を同じに揃えて出力できる(orphanなリードは別出力)。

 

 

インストール

condaやbrewで導入できる。

Github

#conda
conda install -c bioconda -y sickle-trim

#homebrew
brew install sickle

sickle se

$ sickle se

 

Usage: sickle se [options] -f <fastq sequence file> -t <quality type> -o <trimmed fastq file>

 

Options:

-f, --fastq-file, Input fastq file (required)

-t, --qual-type, Type of quality values (solexa (CASAVA < 1.3), illumina (CASAVA 1.3 to 1.7), sanger (which is CASAVA >= 1.8)) (required)

-o, --output-file, Output trimmed fastq file (required)

-q, --qual-threshold, Threshold for trimming based on average quality in a window. Default 20.

-l, --length-threshold, Threshold to keep a read based on length after trimming. Default 20.

-x, --no-fiveprime, Don't do five prime trimming.

-n, --trunc-n, Truncate sequences at position of first N.

-g, --gzip-output, Output gzipped files.

--quiet, Don't print out any trimming information

--help, display this help and exit

--version, output version information and exit

 

sickle pe -h

$ sickle pe -h

sickle: invalid option -- h

 

If you have separate files for forward and reverse reads:

Usage: sickle pe [options] -f <paired-end forward fastq file> -r <paired-end reverse fastq file> -t <quality type> -o <trimmed PE forward file> -p <trimmed PE reverse file> -s <trimmed singles file>

 

If you have one file with interleaved forward and reverse reads:

Usage: sickle pe [options] -c <interleaved input file> -t <quality type> -m <interleaved trimmed paired-end output> -s <trimmed singles file>

 

If you have one file with interleaved reads as input and you want ONLY one interleaved file as output:

Usage: sickle pe [options] -c <interleaved input file> -t <quality type> -M <interleaved trimmed output>

 

Options:

Paired-end separated reads

--------------------------

-f, --pe-file1, Input paired-end forward fastq file (Input files must have same number of records)

-r, --pe-file2, Input paired-end reverse fastq file

-o, --output-pe1, Output trimmed forward fastq file

-p, --output-pe2, Output trimmed reverse fastq file. Must use -s option.

 

Paired-end interleaved reads

----------------------------

-c, --pe-combo, Combined (interleaved) input paired-end fastq

-m, --output-combo, Output combined (interleaved) paired-end fastq file. Must use -s option.

-M, --output-combo-all, Output combined (interleaved) paired-end fastq file with any discarded read written to output file as a single N. Cannot be used with the -s option.

 

Global options

--------------

-t, --qual-type, Type of quality values (solexa (CASAVA < 1.3), illumina (CASAVA 1.3 to 1.7), sanger (which is CASAVA >= 1.8)) (required)

-s, --output-single, Output trimmed singles fastq file

-q, --qual-threshold, Threshold for trimming based on average quality in a window. Default 20.

-l, --length-threshold, Threshold to keep a read based on length after trimming. Default 20.

-x, --no-fiveprime, Don't do five prime trimming.

-n, --truncate-n, Truncate sequences at position of first N.

-g, --gzip-output, Output gzipped files.

--quiet, do not output trimming info

--help, display this help and exit

--version, output version information and exit

 

 

入出力について

  • 対応クオリティフォーマット

=> Illumina、Solexa、Sanger。

  • 3行目の+のラインは入力に関わらずCASAVA >= 1.8で標準の+だけで出力される。
  • gzip圧縮ファイルの入力にも対応。
  • 出力はdefaulでは非圧縮fastq。

 

 

ラン

シングルエンド。Q30以下の領域をトリムし、40-bp以下になったリードは除く。

sickle se -f single.fq -t sanger -o trimmed_output.fq -q 30 -l 40
  • se single-end sequence trimming
  • -f Input fastq file (required)
  • -t Type of quality values (solexa (CASAVA < 1.3), illumina (CASAVA 1.3 to 1.7), sanger (which is CASAVA >= 1.8)) (required)
  • -o Output trimmed fastq file (required)
  • -q Threshold for trimming based on average quality in a window. Default 20.
  • -l  Threshold to keep a read based on length after trimming. Default 20.
  • -x Don't do five prime trimming.

-xをつけると3'側のみがトリミング対象になる。

 

 

ペアエンド。Q30以下の領域をトリムし、20-bp以下になったリードは除く。-gをつけるとgzip出力する。

sickle pe -f R1.fq.gz -r R2.fq.gz -t sanger -o trimmed_R1.fq.gz -p trimmed_R2.fq.gz -s trimmed_singles.fq.gz -q 30 -l 20 -g
  • pe paired-end sequence trimming
  • -f Input paired-end forward fastq file (Input files must have same number of records)
  • -r Input paired-end reverse fastq file
  • -o Output trimmed forward fastq file
  • -p Output trimmed reverse fastq file. Must use
  • -s --output-single, Output trimmed singles fastq file
  • -q Threshold for trimming based on average quality in a window. Default 20.
  • -l Threshold to keep a read based on length after trimming. Default 20.
  • -n Truncate sequences at position of first N.
  • -g    Output gzipped files.

 

ペアエンドのインターレースファイル。

sickle pe -c interlace.fastq -t sanger -m interlace_trimmed.fastq -s trimmed_singles.fastq
  • -c Combined (interleaved) input paired-end fastq
  • -m Output combined (interleaved) paired-end fastq file. Must use -s option.
  • -M Output combined (interleaved) paired-end fastq file with any discarded read written to output file as a single N. Cannot be used with the -s option.

 

 

 

テスト

最近シーケンスしたデータを使う。

p='R1.fq' 
q='R2.fq'
mkdir raw_data_qc_reports
mkdir Quality30_trimmed_reports
sickle pe -f $p -r $q -t sanger -o ${p%.fastq}_Q30_trimmed.fastq -p ${q%.fastq}_Q30_trimmed.fastq -s trimmed_singles.fastq -q 30 -l 20

 

fastqcで分析

fastqc --nogroup -o ./raw_data_qc_reports $p $q 

 処理前

f:id:kazumaxneo:20170907172550j:plain

処理後

a=${p%.fastq}_Q30_trimmed.fastq 
b=${q%.fastq}_Q30_trimmed.fastq
fastqc --nogroup -o ./Quality30_trimmed_reports $a $b

f:id:kazumaxneo:20170907172606j:plain

 

引用

Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33)

Joshi NA, Fass JN. (2011).

[Software]. Available at https://github.com/najoshi/sickle.