100MBのデータならおよそ10秒程度で処理できる(10スレッド使用時)。
インストール
git clone https://github.com/lh3/bfc.git
cd bfc/
make
./bfc -h #動作確認
user$ ./bfc -h
Usage: bfc [options] <to-count.fq> [to-correct.fq]
Options:
-s FLOAT approx genome size (k/m/g allowed; change -k and -b) [unset]
-k INT k-mer length [33]
-t INT number of threads [1]
-b INT set Bloom filter size to pow(2,INT) bits [33]
-H INT use INT hash functions for Bloom filter [4]
-d FILE dump hash table to FILE [null]
-E skip error correction
-R refine bfc-corrected reads
-r FILE restore hash table from FILE [null]
-w INT no more than 5 ec or 2 highQ ec in INT-bp window [10]
-c INT min k-mer coverage [3]
-Q force FASTA output
-1 drop reads containing unique k-mers
-v show version number
-h show command line help
パスの通ったディレクトリに移動しておく。
ラン
k=33でシングルエンドのデータをエラー補正。
bfc -s 3g -k33 -t12 input.fq > correct.fq
- -s FLOAT approx genome size (k/m/g allowed; change -k and -b) [unset]
- -k INT k-mer length [33]
- -t INT number of threads [1]
- -c INT min k-mer coverage [3]
シングルトンの配列は除く。入出力はgz圧縮。
bfc -1 -s 3g -k33 -t12 corrected.fq.gz | gzip -1 > trimmed.fq.gz
- -1 drop reads containing unique k-mers
引用
BFC: correcting Illumina sequencing errors.
Li H.
Bioinformatics. 2015 Sep 1;31(17):2885-7. doi: 10.1093/bioinformatics/btv290. Epub 2015 May 6.