シーケンスされる長さより短いライブラリサイズのシーケンスを行うと、3'側にアダプタやバーコードが出現する。このような汚染配列があると、後の解析に悪影響を与える可能性があるため、クオリティチェックの時に除くのが望ましい。AlienTrimmerはユーザが指定したk-merサイズに配列を分解し、汚染配列があるかどうか短時間で検出する。
- マニュアルPDF
解凍したツール内にある。
インストール
FTPサイト
ftp://ftp.pasteur.fr/pub/gensoft/projects/AlienTrimmer/
ダウンロードしたAlienTrimmer_0.4.0.tar.gzを解凍し、srcに入る。
chmod a+x JarMaker.sh
./JarMaker.sh
java -jar AlienTrimmer.jar #動作確認
user$ java -jar AlienTrimmer.jar
AlienTrimmer v.0.4.0
USAGE: AlienTrimmer [options]
Fast trimming to filter out non-confident nucleotides and alien oligo-
nucleotide sequences (adaptors, primers) in both 5' and 3' read ends
OPTIONS:
-i <infile> [single-ends] FASTQ formatted input file name
-if <infile> [paired-ends] FASTQ formatted input file name containing
forward (fwd) reads
-ir <infile> [paired-ends] FASTQ formatted input file name containing
reverse (rev) reads
-o <outfile> [single-ends] output file name
-of <outfile> [paired-ends] output file name for trimmed fwd reads
-or <outfile> [paired-ends] output file name for trimmed rev reads
-os <outfile> [paired-ends] output file name for remaining trimmed
single (sgl) reads
-c [0-9]* [single-ends] alien sequence id(s) (see option -d)
-d [0-9] displays alien sequences for the specified id:
0: Homopolymers
1: Dimers
2: Trimers
-c <infile> [single-ends] input file name containing user-defined
alien sequence(s) (one line per sequence)
-cf [0-9]* [paired-ends] same as -c for only fwd reads
-cf <infile> [paired-ends] same as -c for only fwd reads
-cr [0-9]* [paired-ends] same as -c for only rev reads
-cr <infile> [paired-ends] same as -c for only rev reads
-k [5-15] k value for k-mer decomposition; must lie between 5 and
15 (default: k=10)
-m <int> allowed mismatch value (default: m=k/2)
-l <int> minimum read length to output; all trimmed reads with
length below this value are filtered out (default: l=15)
-q [0-40] Phred quality score cutoff to trim off low-quality read
ends; must lie between 0 and 20 (default: q=20)
-p [0-100] minimum allowed percentage of correctly called
nucleotides (i.e. with Phred quality score character
higher than q); all reads with a percentage of correctly
called nucleotide lower than this value are filtered out
(default: p=0)
-v displays trimming details during the whole process
EXAMPLES:
[single-ends]
AlienTrimmer -i reads.fq -o trim.fq -c 0 -l 30 -p 80
AlienTrimmer -i reads.fq -c aliens.fa -k 9 -q 10
[paired-ends]
AlienTrimmer -if fwd.fq -ir rev.fq -c alien.fa -q 0 -p 75
AlienTrimmer -if fwd.fq -ir rev.fq -cf alien.fwd.fa -cr alien.rev.fa
実行方法
汚染配列を指定してラン。
java -jar AlienTrimmer.jar -if fwd.fq -ir rev.fq -c alien.fa -q 0 -p 75
- -if <infile> [paired-ends] FASTQ formatted input file name containing forward (fwd) reads
- -ir <infile> [paired-ends] FASTQ formatted input file name containing reverse (rev) reads
- -o <outfile> [single-ends] output file name
- -q [0-40] Phred quality score cutoff to trim off low-quality read ends; must lie between 0 and 20 (default: q=20)
- -p [0-100] minimum allowed percentage of correctly called nucleotides (i.e. with Phred quality score character higher than q); all reads with a percentage of correctly called nucleotide lower than this value are filtered out (default: p=0)
- -v displays trimming details during the whole process
上記FTPサイトからテスト用の454、illuminaのテストデータ(シーケンスリードと除く汚染配列)をダウロードできます。dsrcで圧縮されています。
解凍には初期バージョンのDSRCが必要です。
引用
Genome-scale probe and primer design with PRIMEGENS.
Srivastava GP, Xu D.
Methods Mol Biol. 2007;402:159-76.