突然変異蓄積実験（MA experiment）用にデザインされた変異コーラー accuMUlate

　突然変異蓄積（MA）実験（wiki）は、突然変異の影響を直接研究するために最も広く用いられている方法である。MA株から全ゲノムを配列決定することにより、研究者は自然突然変異の発生率と分子スペクトルを直接研究することができ、これらの結果を用いて突然変異が生物学的プロセスにどのように寄与するかを理解することができる。現在のところ、MA株からの突然変異を同定するために特別に設計されたソフトウェアはない。ここでは、accuMUlateについて述べる。accuMUlateは、典型的なMA実験のデザインを反映しながらも、特定の実験に特有の性質にも対応できる柔軟性を持った確率的変異判定ソフトである。

wiki

https://github.com/dwinter/accuMUlate/wiki

インストール

ubuntuでテストした。cmakeが古かったのでこちらに従って導入した。

依存

BamTools
Eigen
Boost::program_options

Github

#eigen3(C++の線形代数ライブラリ)
apt-get install libeigen3-dev
#bamtoolsはwiki参照

git clone https://github.com/dwinter/accuMUlate.git
cd accuMUlate/
mkdir build
cd build
cmake .. DBamtools_PREFIX=
make

> ./accuMUlate -h

Command line options:

-h [ --help ] Print a help message

-v [ --version ] Print the version number

-b [ --bam ] arg Path to BAM file

-x [ --bam-index ] arg Path to BAM index, (defalult is

<bam_path>.bai

-r [ --reference ] arg Path to reference genome

-a [ --ancestor ] arg Ancestor RG sample ID

-s [ --sample-name ] arg Sample tags to include

-q [ --qual ] arg (=13) Base quality cuttoff

-m [ --mapping-qual ] arg (=13) Mapping quality cuttoff

-p [ --prob ] arg (=0.10000000000000001)

Mutaton probability cut-off

-o [ --out ] arg Out file name (default is std out)

-i [ --intervals ] arg Path to bed file

-c [ --config ] arg Path to config file

--header arg Alternative header

--theta arg theta

--nfreqs arg Nucleotide frequencies

--mu arg Experiment-long mutation rate

--seq-error arg Probability of sequencing error

--ploidy-ancestor arg (=2) Polidy of ancestor (1 or 2)

--ploidy-descendant arg (=2) Ploidy of descendant (1 or 2)

--phi-haploid arg Over-dispersion for haploid sequencing

--phi-diploid arg Over-dispersion for diploid sequencing

> ./denominate -h

Command line options (not: all options can be set via configuration file):

-h [ --help ] Print a help message

-v [ --version ] Print the version number

-b [ --bam ] arg Path to BAM file

-x [ --bam-index ] arg Path to BAM index, (defalult is

<bam_path>.bai

-r [ --reference ] arg Path to reference genome

-c [ --config ] arg Path to config file

-i [ --intervals ] arg Path to bed file

-a [ --ancestor ] arg Ancestor RG sample ID

-s [ --sample-name ] arg Sample tags

-q [ --qual ] arg (=13) Base quality cuttoff

-m [ --mapping-qual ] arg (=13) Mapping quality cuttoff

-p [ --prob ] arg (=0.10000000000000001)

Prob quality cuttoff

--header arg Alternative header

--theta arg theta

--nfreqs arg Nucleotide frequencies

--mu arg Experiment-long mutation rate

--seq-error arg Probability of sequencing error

--ploidy-ancestor arg (=2) Polidy of ancestor (1 or 2)

--ploidy-descendant arg (=2) Ploidy of descendant (1 or 2)

--phi-haploid arg Over-dispersion for haploid sequencing

--phi-diploid arg Over-dispersion for diploid sequencing

--min-depth arg (=0) Mimimum sequencing depth for a site to

be included

--max-depth arg (=4294967295) Maximum sequencing depth for a site to

be included

--min-mutant-strand arg (=0) Minimum number of alleles supporting

the mutant on each strand

--max-anc-in-mutant arg (=4294967295) Maximum number of ancestral alleles in

mutant sample

--max-mutant-in-anc arg (=4294967295) Maximum number of mutant alleles in

ancestral samples

--max-MQ-AD arg (=inf) Maximum value of the AD test for

mapping quality differences

--max-insert-AD arg (=inf) Maximum value of the AD test for insert

length differences

--min-strand-pval arg (=0) Minimum p-value for strand bias

--min-mapping-pval arg (=0) Minimum p-value for paired-mapping bias

テストラン

accuMUlate -c test/data/example_params.ini \
-b test/data/test.bam \
-r test/data/test.fasta \
-i test/data/test.bed

リファレンスゲノムにマッピングされた全サンプルのデータを含む単一のBAMファイルを必要とする。すべてのシーケンスデータが1つのファイルに含まれるため、このBAMファイルのヘッダーにリードグループ情報が含まれ、リードをサンプルに割り当てることができている必要がある（example_params.ini参照）。データが複数のBAMファイル（おそらくサンプルごとに1つ）に含まれている場合、samtools merge（@RGヘッダーを作成するために-rフラグを使用）またはPicardのMergeSamFilesで1つのBAMファイルを生成する。

出力例

タブ区切りの表が出力される（wiki）。最初の3行はBEDファイル（染色体、開始、終了）と一致するため、このファイルは下流の解析やその後のaccuMUlateのコール（変異確率のパラメータ値の影響をテストするためなど）でそのまま使用できる。推定される変異の位置に加えて、出力には変異サンプル、変異および祖先の遺伝子型のデータと、偽陽性の可能性があるコールを識別するために使用できる一連の要約統計量が含まれている（detailed）。

分母の計算
突然変異蓄積実験の目的は、突然変異率を推定することであることが非常に多い。このためには、変異の数（分子）と、変異を検出できる部位の数（分母）の両方を知る必要がある。accuMUlateには、コール可能な部位の数を計算するdenominateと呼ばれるプログラムが付属している。denominateはaccuMUlateが行う全ての引数を取るが、レポジトリで説明されているフィルタリング基準を設定することもできる。
denominateの出力は、コール可能な部位のカウントの1行であり、最初の4つの値は、最初のサンプルについて呼び出し可能な祖先由来の "A"、"C"、"G"、"T "塩基の数であり、それ以降の各サンプルはさらに4つの列で表される。accumulate-tools repostioryには、これらのファイルを読み込んで操作するための関数を含むRスクリプトが含まれている。
ダウンストリーム解析を行うためのrmarkdownドキュメントも公開されている（link）。

引用

accuMUlate: a mutation caller designed for mutation accumulation experiments
David J Winter, Steven H Wu, Abigail A Howell, Ricardo B R Azevedo, Rebecca A Zufall, Reed A Cartwright

Bioinformatics. 2018 Aug 1; 34(15): 2659–2660