XSはIon Torrent、Roche-454、Illumina、SOLiDに対応したショートリードのシミュレータ。軽量で依存がないように設計されている。クラウドに向けて時間とメモリに応じていくつかの実行モードを備えている。リファンレスは使わない。塩基はランダム発生され、そこにパラメータを追加することで、シーケンスの様々なbiasや特性を調べるツールとなる。
公式HP
http://bioinformatics.ua.pt/software/xs/
マニュアル(README)
https://github.com/pratas/xs/blob/master/README
インストール
wget https://github.com/pratas/xs/archive/master.zip
unzip master.zip
cd xs-master
make
./XS -h #動作確認
user$ ./XS -h
Usage: XS [OPTION]... [FILE]
System options:
-h give this help
-v verbose mode
Main FASTQ options:
-t <sequencingType> type: 1=Roche-454, 2=Illumina, 3=ABI SOLiD, 4=Ion Torrent
-hf <headerFormat> header format: 1=Length appendix, 2=Pair End
-i n=<instrumentName> the unique instrument name (use n= before name)
-o use the same header in third line of the read
-ls <lineSize> static line (bases/quality scores) size
-ld <minSize>:<maxSize> dynamic line (bases/quality scores) size
-n <numberOfReads> number of reads per file
DNA options:
-f <A>,<C>,<G>,<T>,<N> symbols frequency
-rn <numberOfRepeats> repeats: number (default: 0)
-ri <repeatsMinSize> repeats: minimum size
-ra <repeatsMaxSize> repeats: maximum size
-rm <mutationRate> repeats: mutation frequency
-rr repeats: use reverse complement repeats
Quality scores options:
-qt <assignmentType> quality scores distribution: 1=uniform, 2=gaussian
-qf <statsFile> load file: mean, standard deviation (when: -qt 2)
-qc <template> custom template ascii alphabet
Filtering options:
-eh excludes the use of headers from output
-eo excludes the use of optional headers (+) from output
-ed excludes the use of DNA bases from output
-edb excludes '\n' when DNA bases line size is reached
-es excludes the use of quality scores from output
Stochastic options:
-s <seed> generation seed
<genFile> simulated output file
Common usage:
./XS -v -t 1 -i n=MySeq -ld 30:80 -n 20000 -qt=1 -qc 33,36,39:43 File
./XS -v -ls 100 -n 10000 -eh -eo -es -edb -f 0.3,0.2,0.2,0.3,0.0 -rn 50 -ri 300 -ra 3000 -rm 0.1 File
XSをパスの通ったディレクトリに移動しておく。
実行方法
パラメータ例
XS -v -t 1 -i n=MySeq -ld 30:80 -n 20000 -qt=1 -qc 33,36,39:43 File
- -v verbose mode
- -t <sequencingType> type: 1=Roche-454, 2=Illumina, 3=ABI SOLiD, 4=Ion Torrent
- -i n=<instrumentName> the unique instrument name (use n= before name)
- -n <numberOfReads> number of reads per file
- -ld <minSize>:<maxSize> dynamic line (bases/quality scores) size
- -qt <assignmentType> quality scores distribution: 1=uniform, 2=gaussian
XS -v -ls 100 -n 10000 -eh -eo -es -edb -f 0.3,0.2,0.2,0.3,0.0 -rn 50 -ri 300 -ra 3000 -rm 0.1 File
- -ls <lineSize> static line (bases/quality scores) size
- -eh excludes the use of headers from output
- -eo excludes the use of optional headers (+) from output
- -es excludes the use of quality scores from output
- -rn <numberOfRepeats> repeats: number (default: 0)
- -ri <repeatsMinSize> repeats: minimum size
- -ra <repeatsMaxSize> repeats: maximum size
- -rm <mutationRate> repeats: mutation frequency
他にもリピート用のフラグなどが用意されています。-sで乱数発生を調整するので、-sの値を変えると変化します。
引用
XS: a FASTQ read simulator
Pratas D, Pinho AJ, Rodrigues JM
BMC Res Notes. 2014 Jan 16;7:40