ロングリードのセルフエラーコレクションツール LoRMA

LoRMAはPacbioなどのロングリードのエラーコレクションツール。ロングリードのエラーコレクションは、ショートリードを使ったハイブリッドなエラーコレクション法とロングリード自身を使ったエラーコレクション法が報告されている。LoRMAは後者に属する方法で、de brujinのgraphを使い、ロングリード自身で２段階のエラー補正を行う（LoRDECを使いk-merサイズに分解したロングリードによりエラー補正→２段階目LoRMAでエラー補正）。

インストール

cent OSに導入した。

依存

LoRDEC

LoRDECは別に紹介しています（リンク）。

公式HP

https://www.cs.helsinki.fi/u/lmsalmel/LoRMA/

ソースを公式HPからダウンロードして解凍、ビルドする。

cd LoRMA-0.4/
mkdir build; cd build; cmake ..; make
cd ../
./lorma.sh #動作確認

#bioconda (link)
conda install -c bioconda -y lorma

build$ ./lorma.sh

Usage: ./lorma.sh [-s] [-n] [-start <19> -end <61> -step <21> -threads <6> -friends <7> -k <19>] *.fasta

-s saves the sequence data of intermediate LoRDEC steps

-n skips LoRDEC steps

> ./LoRMA

build]$ ./LoRMA

ERROR: Option '-discarded' is mandatory

ERROR: Option '-output' is mandatory

ERROR: Option '-reads' is mandatory

[LoRMA options]

-bestfriends (1 arg) : Number of best friends [default '3']

-friends (1 arg) : Number of friends [default '7']

-k (1 arg) : kmer length [default '31']

-discarded (1 arg) : output file for discarded reads

-output (1 arg) : output file for corrected reads

-reads (1 arg) : file(s) of long reads

-nb-cores (1 arg) : number of cores [default '1']

-verbose (1 arg) : verbosity level [default '1']

実行方法

lorma.shをランすると、 k-merを変えながら繰り返しLoRDECを動かしエラー補正を行う。それからLoRDEC付属のツールでトリムし、補正されたリードだけ出力する。それを今回のLoRMAに読み込み、２回目の補正が行われる。

 lorma.sh -start 19 -end 61 -step 21 -k 19 pacbio.fasta

-start <int>　k for the first LoRDEC step (default: 19)
-end <int>　upper limit for k for the LoRDEC steps (default: 61)
-step <int>　the increase of k between the LoRDEC steps (default: 21)
-k <int>　k value for running LoRMA (default: 19)
-s　save sequence data of the intermediate LoRDEC steps
-n　do not run LoRDEC steps

並列化に対応しているが、１スレッド増やすとかなりのメモリーを使うので注意。

引用

Accurate self-correction of errors in long reads using de Bruijn graphs

Leena Salmela, Riku Walve, Eric Rivals, and Esko Ukkonen

Bioinformatics. 2017 Mar 15; 33(6): 799–806.

参考パワーポイント

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

ロングリードのセルフエラーコレクションツール LoRMA