2020 10/31 インストール追記
2020 11/24 help追記
2021 6/15 コマンド追記
sickleはfastqのクオリティトリミングツール。リード長の0.1倍のウィンドウサイズでリードを分析し、指定値以下のクオリティになった領域をトリムする。Trimmomaticと同様、ペアリードの順番が破壊されないよう、ペアの数を同じに揃えて出力できる(orphanなリードは別出力)。
インストール
condaやbrewで導入できる。
#conda
conda install -c bioconda -y sickle-trim
#homebrew
brew install sickle
> sickle se
$ sickle se
Usage: sickle se [options] -f <fastq sequence file> -t <quality type> -o <trimmed fastq file>
Options:
-f, --fastq-file, Input fastq file (required)
-t, --qual-type, Type of quality values (solexa (CASAVA < 1.3), illumina (CASAVA 1.3 to 1.7), sanger (which is CASAVA >= 1.8)) (required)
-o, --output-file, Output trimmed fastq file (required)
-q, --qual-threshold, Threshold for trimming based on average quality in a window. Default 20.
-l, --length-threshold, Threshold to keep a read based on length after trimming. Default 20.
-x, --no-fiveprime, Don't do five prime trimming.
-n, --trunc-n, Truncate sequences at position of first N.
-g, --gzip-output, Output gzipped files.
--quiet, Don't print out any trimming information
--help, display this help and exit
--version, output version information and exit
> sickle pe -h
$ sickle pe -h
sickle: invalid option -- h
If you have separate files for forward and reverse reads:
Usage: sickle pe [options] -f <paired-end forward fastq file> -r <paired-end reverse fastq file> -t <quality type> -o <trimmed PE forward file> -p <trimmed PE reverse file> -s <trimmed singles file>
If you have one file with interleaved forward and reverse reads:
Usage: sickle pe [options] -c <interleaved input file> -t <quality type> -m <interleaved trimmed paired-end output> -s <trimmed singles file>
If you have one file with interleaved reads as input and you want ONLY one interleaved file as output:
Usage: sickle pe [options] -c <interleaved input file> -t <quality type> -M <interleaved trimmed output>
Options:
Paired-end separated reads
--------------------------
-f, --pe-file1, Input paired-end forward fastq file (Input files must have same number of records)
-r, --pe-file2, Input paired-end reverse fastq file
-o, --output-pe1, Output trimmed forward fastq file
-p, --output-pe2, Output trimmed reverse fastq file. Must use -s option.
Paired-end interleaved reads
----------------------------
-c, --pe-combo, Combined (interleaved) input paired-end fastq
-m, --output-combo, Output combined (interleaved) paired-end fastq file. Must use -s option.
-M, --output-combo-all, Output combined (interleaved) paired-end fastq file with any discarded read written to output file as a single N. Cannot be used with the -s option.
Global options
--------------
-t, --qual-type, Type of quality values (solexa (CASAVA < 1.3), illumina (CASAVA 1.3 to 1.7), sanger (which is CASAVA >= 1.8)) (required)
-s, --output-single, Output trimmed singles fastq file
-q, --qual-threshold, Threshold for trimming based on average quality in a window. Default 20.
-l, --length-threshold, Threshold to keep a read based on length after trimming. Default 20.
-x, --no-fiveprime, Don't do five prime trimming.
-n, --truncate-n, Truncate sequences at position of first N.
-g, --gzip-output, Output gzipped files.
--quiet, do not output trimming info
--help, display this help and exit
--version, output version information and exit
入出力について
- 対応クオリティフォーマット
=> Illumina、Solexa、Sanger。
- 3行目の+のラインは入力に関わらずCASAVA >= 1.8で標準の+だけで出力される。
- gzip圧縮ファイルの入力にも対応。
- 出力はdefaulでは非圧縮fastq。
ラン
シングルエンド。Q30以下の領域をトリムし、40-bp以下になったリードは除く。
sickle se -f single.fq -t sanger -o trimmed_output.fq -q 30 -l 40
- se single-end sequence trimming
- -f Input fastq file (required)
- -t Type of quality values (solexa (CASAVA < 1.3), illumina (CASAVA 1.3 to 1.7), sanger (which is CASAVA >= 1.8)) (required)
- -o Output trimmed fastq file (required)
- -q Threshold for trimming based on average quality in a window. Default 20.
- -l Threshold to keep a read based on length after trimming. Default 20.
- -x Don't do five prime trimming.
-xをつけると3'側のみがトリミング対象になる。
ペアエンド。Q30以下の領域をトリムし、20-bp以下になったリードは除く。-gをつけるとgzip出力する。
sickle pe -f R1.fq.gz -r R2.fq.gz -t sanger -o trimmed_R1.fq.gz -p trimmed_R2.fq.gz -s trimmed_singles.fq.gz -q 30 -l 20 -g
- pe paired-end sequence trimming
- -f Input paired-end forward fastq file (Input files must have same number of records)
- -r Input paired-end reverse fastq file
- -o Output trimmed forward fastq file
- -p Output trimmed reverse fastq file. Must use
- -s --output-single, Output trimmed singles fastq file
- -q Threshold for trimming based on average quality in a window. Default 20.
- -l Threshold to keep a read based on length after trimming. Default 20.
- -n Truncate sequences at position of first N.
- -g Output gzipped files.
ペアエンドのインターレースファイル。
sickle pe -c interlace.fastq -t sanger -m interlace_trimmed.fastq -s trimmed_singles.fastq
- -c Combined (interleaved) input paired-end fastq
- -m Output combined (interleaved) paired-end fastq file. Must use -s option.
- -M Output combined (interleaved) paired-end fastq file with any discarded read written to output file as a single N. Cannot be used with the -s option.
テスト
最近シーケンスしたデータを使う。
p='R1.fq'
q='R2.fq'
mkdir raw_data_qc_reports
mkdir Quality30_trimmed_reports
sickle pe -f $p -r $q -t sanger -o ${p%.fastq}_Q30_trimmed.fastq -p ${q%.fastq}_Q30_trimmed.fastq -s trimmed_singles.fastq -q 30 -l 20
fastqcで分析
fastqc --nogroup -o ./raw_data_qc_reports $p $q
処理前
処理後
a=${p%.fastq}_Q30_trimmed.fastq
b=${q%.fastq}_Q30_trimmed.fastq
fastqc --nogroup -o ./Quality30_trimmed_reports $a $b
引用
Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33)
Joshi NA, Fass JN. (2011).
[Software]. Available at https://github.com/najoshi/sickle.