エラーコレクションツール BFC - macでインフォマティクス

100MBのデータならおよそ10秒程度で処理できる（10スレッド使用時）。

インストール

git clone https://github.com/lh3/bfc.git
cd bfc/
make
./bfc -h #動作確認

user$ ./bfc -h

Usage: bfc [options] <to-count.fq> [to-correct.fq]

Options:

-s FLOAT approx genome size (k/m/g allowed; change -k and -b) [unset]

-k INT k-mer length [33]

-t INT number of threads [1]

-b INT set Bloom filter size to pow(2,INT) bits [33]

-H INT use INT hash functions for Bloom filter [4]

-d FILE dump hash table to FILE [null]

-E skip error correction

-R refine bfc-corrected reads

-r FILE restore hash table from FILE [null]

-w INT no more than 5 ec or 2 highQ ec in INT-bp window [10]

-c INT min k-mer coverage [3]

-Q force FASTA output

-1 drop reads containing unique k-mers

-v show version number

-h show command line help

パスの通ったディレクトリに移動しておく。

ラン

k=33でシングルエンドのデータをエラー補正。

bfc -s 3g -k33 -t12 input.fq > correct.fq

シングルトンの配列は除く。入出力はgz圧縮。

bfc -1 -s 3g -k33 -t12 corrected.fq.gz | gzip -1 > trimmed.fq.gz

引用

BFC: correcting Illumina sequencing errors.

Li H.

Bioinformatics. 2015 Sep 1;31(17):2885-7. doi: 10.1093/bioinformatics/btv290. Epub 2015 May 6.