（ウィリス向け）高感度なHMMベースのアライナ ngshmmalign

現在のシーケンシンでは、NGSのリードのアラインメントはbwa（http://bio-bwa.sourceforge.net）やbowtie（http://bowtie-bio.sourceforge.net/bowtie2/index.shtml）などのアライナを使用して行われる。これらのアライナは高速で、大きな真核生物ゲノムのトランスクリプトームデータおよびエキソミックデータには適しているが、HIV-1やHCVなど、進化の過程で多数の変異を経験しているゲノムに対して多数のアーティファクトを生成する。これらのRNAウイルスはheterogeneous mixtureとして現れ、indelおよび点変異がアラインメント段階において問題を引き起こす。より高感度のアラインメントを作成するために、ngshmmalignはプロファイルHMMを実装して、グローバルでゲノムワイドなマルチプルシーケンスアラインメントに頼ることなく、そのようなウイルスの研究により適したアラインメントを作成する。プロファイルＨＭＭは、例えばHMMER（http://hmmer.org）で知られている、よく知られた確率的グラフィカルモデルである。

インストール

ubuntu16.0.4でテストした（docker使用、ホストOS macos10.14）

In order to install ngshmmalign, download a release tarball from https://github.com/cbg-ethz/ngshmmalign/releases. You can also install ngshmmalign from a git checkout, although this is not the recommended.

ビルド依存

A C++11 compliant compiler. The ngshmmalign codebase makes extensive use of C++11 features.
Boost; at least 1.50
standard Unix utilities; such as sed, make, etc...
MAFFT
CMake
Autoconf
Automake

Github

#bioconda（link）
conda install -c bioconda -y ngshmmalign


#ソースからビルドするならboostをインストールし、共有ライブラリLD_LIBRARY_PATHにexportしておく。/usr/local/boost_1_68_0/libはboostライブラリのパスに合わせて変更。
export LD_LIBRARY_PATH=/usr/local/boost_1_68_0/lib:$LD_LIBRARY_PATH

git clone https://github.com/cbg-ethz/ngshmmalign.git
cd ngshmmalign/ 
./autogen.sh

##git cloneしたngshmmalignのパスを指定してconfigure、Makefile作成。
./configure --prefix=/usr/local/ngshmmalign

#ビルド、インストール
make -j 8
make install

> ./ngshmmalign -h

# ./ngshmmalign -h

Warning: en_US.UTF-8 could not be imbued, this is likely due to a missing locale on your system

Allowed options:

Generic options:

-h [ --help ] Print this help

Configuration:

-r arg File containing the profile/MSA of the

reference

-R arg File containing the profile/MSA of the

reference. Will perform a comprehensive

parameter estimation using MAFFT. Mutually

exclusive with -r option

-o arg (=aln.sam) Filename where alignment will be written to

-w [ --wrong ] arg (=/dev/null) Filename where alignment will be written

that are filtered (too short, unpaired)

-t arg (=4) Number of threads to use for alignment.

Defaults to number of logical cores found

-l Do not clean up MAFFT temporary MSA files

-E Use full-exhaustive search, avoiding indexed

lookup

-X Replace general aligned state 'M' with '='

(match) and 'X' (mismatch) in CIGAR

-N arg (=CONSENSUS) Name of consensus reference contig that will

be created

-U Loci with ambiguous bases get their emission

probabilities according to their allele

frequencies. In practice this is

undesirable, as it leads to systematic

accumulation of gaps in homopolymeric

regions with SNVs

-s [ --seed ] arg (=42) Value of seed for deterministic run. A value

of 0 will pick a random seed from some

non-deterministic entropy source

--hard Hard-clip reads. Clipped bases will NOT be

in the sequence in the alignment

--HARD Extreme Hard-clip reads. Do not write

hard-clip in CIGAR, as if the hard-clipped

bases never existed. Mutually exclusive with

previous option

-v Show progress indicator while aligning

-M arg (=L * 0.8) Minimum mapped length of read

-a arg (=0.05) Minimum frequency for calling ambiguous base

--error arg (=0.005) Global substitution probability

--go arg (=1e-4) Gap open probability

--ge arg (=0.30) Gap extend probability

--io arg (=5e-5) Insert open probability

--ie arg (=0.50) Insert extend probability

--ep arg (=1/L) Jump to end probability; usually 1/L, where

L is the average length of the reads

--lco arg (=0.10) Left clip open probability

--lce arg (=0.90) Left clip extend probability

--rco arg (=lco/L) Right clip open probability

--rce arg (=0.90) Right clip extend probability

実行方法

リファレンスのfasta（extensionは.fastaにする）とfastqを指定する。-rの代わりに"-R"を立てるとアライナーがmafftに切り替わる。

ngshmmalign -r ref.fasta pair_R1.fastq -o output -v -t 8

-r File containing the profile/MSA of the reference
-R File containing the profile/MSA of the reference. Will perform a comprehensive parameter estimation using MAFFT. Mutually exclusive with -r option

引用

github

https://github.com/cbg-ethz/ngshmmalign