macでインフォマティクス

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

エラーコレクションツール lighter

 

 

インストール

cent OSに導入した。

Github

https://github.com/mourisl/Lighter

git clone https://github.com/mourisl/Lighter.git
cd Lighter/
make

./lighter #動作確認

ghter]$ lighter 

Usage: ./lighter [OPTIONS]

OPTIONS:

Required parameters:

-r seq_file: seq_file is the path to the sequence file. Can use multiple -r to specifiy multiple sequence files

            The file can be fasta and fastq, and can be gzip'ed with extension *.gz.

            When the input file is *.gz, the corresponding output file will also be gzip'ed.

-k kmer_length genome_size alpha: (see README for information on setting alpha)

or

-K kmer_length genom_size: in this case, the genome size should be relative accurate.

Other parameters:

-od output_file_directory: (default: ./)

-t num_of_threads: number of threads to use (default: 1)

-maxcor INT: the maximum number of corrections within a 20bp window (default: 4)

-trim: allow trimming (default: false)

-discard: discard unfixable reads. Will LOSE paired-end matching when discarding (default: false)

-noQual: ignore the quality socre (default: false)

-newQual ascii_quality_score: set the quality for the bases corrected to the specified score (default: not used)

-saveTrustedKmers file: save the trusted kmers to specified file then stop (default: not used)

-loadTrustedKmers file: directly get solid kmers from specified file (default: not used)

-zlib compress_level: set the compression level(0-9) of gzip (default: 1)

-h: print the help message and quit

-v: print the version information and quit

lighterをパスが通ったディレクトリに移動しておく。macに導入する場合、オーサーが準備してくれているconda環境でインストールしてください。brewでも導入できるようです。

  

実行方法

ゲノムサイズを指定してエラー補正を行う。シングルエンド。

lighter -r single.fq -k 17 5000000 0.1 -t 12
  • -t num_of_threads: number of threads to use (default: 1)
  • -od output_file_directory: (default: ./)
  • -k kmer_length genome_size alpha: (see README for information on setting alpha)
  • -r seq_file seq_file is the path to the sequence file. Can use multiple -r to specifiy multiple sequence files-r seq_file: seq_file is the path to the sequence file. Can use multiple -r to specifiy multiple sequence files  The file can be fasta and fastq, and can be gzip'ed with extension *.gz.  When the input file is *.gz, the corresponding output file will also be gzip'ed.

 

ペアードエンド。

lighter -r left.fq -r right.fq -k 17 5000000 0.1 -t 12

 

 k=17が常にベストとは限らないようで、k=13、15、19でよりアセンブルがよくなったという話もあります(ref.1)。

  

引用

Lighter: fast and memory-efficient sequencing error correction without counting

Li Song, Liliana Florea and Ben Langmead

Genome Biol. 2014;15(11):509.

 

ref.1

Evaluation of the impact of Illumina error correction tools on de novo genome assembly | BMC Bioinformatics | Full Text