macでインフォマティクス

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

k-merカウントツール meryl

 

merylはk-merカウントを行うツール。Celera Assemblerのために書かれた'meryl'をほぼ全面的に書き直したものが公開されている。

 

マニュアル

https://meryl.readthedocs.io/en/latest/index.html

 

インストール

ビルド依存

  • gcc 7.4.0 or higher

Github

##from release
wget https://github.com/marbl/meryl/releases/download/v1.3/meryl-1.3.Linux-amd64.tar.xz
tar -xJf meryl-1.3.Linux-amd64.tar.xz
export PATH=/path/to/meryl-1.3/build/bin:$PATH

#from source
git clone https://github.com/marbl/meryl.git
cd meryl/src
make -j 24
export PATH=/path/to/meryl/*/bin:$PATH

> meryl

usage: meryl ...

 

  A meryl command line is formed as a series of commands and files, possibly

  grouped using square brackets.  Each command operates on the file(s) that

  are listed after it.

 

  COMMANDS:

 

    statistics           display total, unique, distnict, present number of the kmers on the screen.  accepts exactly one input.

    histogram            display kmer frequency on the screen as 'frequency<tab>count'.  accepts exactly one input.

    print                display kmers on the screen as 'kmer<tab>count'.  accepts exactly one input.

 

    count                Count the occurrences of canonical kmers in the input.  must have 'output' specified.

    count-forward        Count the occurrences of forward kmers in the input.  must have 'output' specified.

    count-reverse        Count the occurrences of reverse kmers in the input.  must have 'output' specified.

      k=<K>              create mers of size K bases (mandatory).

      n=<N>              expect N mers in the input (optional; for precise memory sizing).

      memory=M           use no more than (about) M GB memory.

      threads=T          use no more than T threads.

      compress           compress homopolymer runs to a single letter.

 

    less-than N          return kmers that occur fewer than N times in the input.  accepts exactly one input.

    greater-than N       return kmers that occur more than N times in the input.  accepts exactly one input.

    equal-to N           return kmers that occur exactly N times in the input.  accepts exactly one input.

    not-equal-to N       return kmers that do not occur exactly N times in the input.  accepts exactly one input.

 

    increase X           add X to the count of each kmer.

    decrease X           subtract X from the count of each kmer.

    multiply X           multiply the count of each kmer by X.

    divide X             divide the count of each kmer by X.

    divide-round X       divide the count of each kmer by X and round results. count < X will become 1.

    modulo X             set the count of each kmer to the remainder of the count divided by X.

 

    union                return kmers that occur in any input, set the count to the number of inputs with this kmer.

    union-min            return kmers that occur in any input, set the count to the minimum count

    union-max            return kmers that occur in any input, set the count to the maximum count

    union-sum            return kmers that occur in any input, set the count to the sum of the counts

 

    intersect            return kmers that occur in all inputs, set the count to the count in the first input.

    intersect-min        return kmers that occur in all inputs, set the count to the minimum count.

    intersect-max        return kmers that occur in all inputs, set the count to the maximum count.

    intersect-sum        return kmers that occur in all inputs, set the count to the sum of the counts.

 

    subtract             return kmers that occur in the first input, subtracting counts from the other inputs

 

    difference           return kmers that occur in the first input, but none of the other inputs

    symmetric-difference return kmers that occur in exactly one input

 

  MODIFIERS:

 

    output O             write kmers generated by the present command to an output  meryl database O

                         mandatory for count operations.

 

  EXAMPLES:

 

  Example:  Report 22-mers present in at least one of input1.fasta and input2.fasta.

            Kmers from each input are saved in meryl databases 'input1' and 'input2',

            but the kmers in the union are only reported to the screen.

 

            meryl print \

                    union \

                      [count k=22 input1.fasta output input1] \

                      [count k=22 input2.fasta output input2]

 

  Example:  Find the highest count of each kmer present in both files, save the kmers to

            database 'maxCount'.

 

            meryl intersect-max input1 input2 output maxCount

 

  Example:  Find unique kmers common to both files.  Brackets are necessary

            on the first 'equal-to' command to prevent the second 'equal-to' from

            being used as an input to the first 'equal-to'.

 

            meryl intersect [equal-to 1 input1] equal-to 1 input2

 

 

 

 

実行方法

ゲノムのfastaファイルとk値、出力ディレクトリを指定する。

meryl count k=15 output merylDB genome.fa.gz

merylDB/に結果は保存される。

 

引用

GitHub - marbl/meryl: A genomic k-mer counter (and sequence utility) with nice features.

 

参考

https://hpc.nih.gov/apps/Meryl.html