高い圧縮率を示すfastqの圧縮ツール。圧縮率が高いだけあって時間はかかるが、1/10ほどのサイズの圧縮ファイルを作ることができる(ロスレス)。
インストール
cent OSに導入した。
環境
GitHub - mariusmni/lfqc: LFQC: Fastq Compression Algorithm
git clone https://github.com/mariusmni/lfqc.git
cd lfqc/lfqc/
zpaqとlpaq8のサブフォルダーのzpaqとlpaq8も必要になる。ランして動かなければそれぞれmakeし直す。
>./lpaq8
$ lpaq8
lpaq8 file compressor (C) 2007, Matt Mahoney
Licensed under GPL, http://www.gnu.org/copyleft/gpl.html
>./zpaq
$ zpaq
zpaq v7.02 journaling archiver, compiled Jan 14 2018
zpaq archiver for incremental backups with rollback capability.
Usage: zpaq {add|extract|list|test} archive[.zpaq] files... -options...
Files... may be directory trees. Default is the whole archive.
Archive may be "" to test compression or comparison.
* and ? in archive match numbers or digits in a multi-part archive.
Part 0 is the index. If present, no other parts are needed to add or list.
Commands (a,x,l,t) and options may be abbreviated if not ambiguous.
-all [N] Extract/list versions in N [4] digit directories.
-key [password] AES-256 encrypted archive [prompt without echo].
-noattributes Ignore/don't save file attributes or permissions.
-not files... Exclude. * and ? match any string or char.
-only files... Include only matches (default: *).
-summary Be brief.
-test Do not write to disk.
-threads N Use N threads (default: 40).
-to out... Rename files... to out... or all to out/all.
-until N Roll back archive to N'th update or -N from end.
-until 2018-01-14 04:36:49 Set date, roll back (UT, default time: 235959).
add options. archive can be "" to test compression with no output:
-force Add files even if the date is unchanged.
-nodelete Do not mark unmatched files as deleted.
-method L Compress level L (0..5 = faster..better, default 1).
LB Use 2^B MB blocks (0..11, default 04, 14, 26..56).
i Index (file metadata only).
-fragment N Set average dedupe fragment size = 2^N KiB (default: 6).
extract options:
-force Overwrite existing files (default: skip).
list (compare files) options:
-force Compare file contents instead of dates (slower).
-not =[+-#?^] Exclude by comparison result.
-summary [N] Show N largest files/dirs only (default: 20).
この2つが動作してパスが通っている必要がある。
ラン
圧縮
ruby lfqc.rb file.fastq
1GBのfastqで10分程度時間がかかる。終わるとfile.fastq.lfqcができる。
またはfastqの 型を明示して圧縮。
ruby lfqc.rb -ls454 file.fastq
ruby lfqc.rb -solid file.fastq
ruby lfqc.rb -solexa file.fastq
解凍。
ruby lfqcd.rb file.fastq.lfqc output.fastq
引用
LFQC: a lossless compression algorithm for FASTQ files
Marius Nicolae, Sudipta Pathak, and Sanguthevar Rajasekaran*
Bioinformatics. 2015 Oct 15; 31(20): 3276–3281.