BWAに近い精度でかつ数倍高速なマッピングツール FSVA

HiseqX10などの登場でシーケンススループットはますます高まっているが、ソフトの方が追いついていない。200GBのデータを処理するのに、BWA MEMだと1CPU使用で80時間程度かかる(20コアでようやく10-20時間)。解決には分散コンピューティング（e.g., pBWA、SparkBWA）や専用のハードウエア（e.g., DRAGON、GPU対応アライナー）を使うのが挙げられるが、こういった環境は誰もが利用できるわけではなく敷居が高い。Fast Seed-and-Vote Aligner（FSVA）はNGSのアライメントの方法論。BWAに近い高精度なアライメントが可能で、なおかつBWA（ref.1）より5-7倍高速と主張されている。200GBのデータのアライメントなら、4CPU使用時で5時間程度となっている。

インストール

Github

git clone https://github.com/Topwood91/FSVA.git 
cd FSVA/
make

./fsva #動作確認

FSVA]$ ./fsva

fsva [options] <ref.fa> <read1.fastq> [read2.fastq]

options:

-t INT number of threads. [1]

-u INT threshold of unrepresentative seed. [450]

-l INT length of seed. If you use this argument when making

index, you should use this argument here, and their

value should be equal. [31]

-r INT length of read. If you use this argument when making

index, you should use this argument here, and their

value should be equal. [150]

実行方法

indexをつける。

index ref.fa

シングルのマッピング

fsva -t 12 ref.fa pair1.fq > out.sam

-t　INT number of threads. [1]
-l <INT>　length of seed. If you use this argument when making -l INT length of seed. If you use this argument when making index, you should use this argument here, and their value should be equal. [31]
-r <INT>　length of read. If you use this argument when making index, you should use this argument here, and their value should be equal. [150]

ペアエンドのマッピング

fsva -t 12 ref.fa pair1.fq pair2.fq > out.sam

引用

A fast read alignment method based on seed-and-vote for next generation sequencing.

Liu S, Wang Y, Wang.

BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):466.

ref.1

http://www.sciencedirect.com/science/article/pii/S0888754317300204

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

BWAに近い精度でかつ数倍高速なマッピングツール FSVA