可変数のタンデムリピート（VNTR）をジェノタイピングする adVNTR

　全ゲノムシークエンシングは、臨床パイプラインでメンデルバリアントを同定するために使用されることが多くなってきている。これらのパイプラインでは、より複雑な繰り返し配列のバリアントを無視して、一塩基変異（SNV）や構造変異に焦点を当てている。ここでは、短い（6-100bp）繰り返し単位の不正確なタンデム重複からなる可変数タンデムリピート（VNTR）の遺伝子型決定の問題を検討する。VNTRはヒトゲノムの3％を占め、コード領域に多く存在し、複数のメンデル病の原因となっている。既存のツールではVNTRを含む配列を認識することができるが、全ゲノム配列からVNTRのジェノタイピング（繰り返し単位数と配列変異の判定）を行うことは困難である。本研究では、隠れマルコフモデルを用いて各VNTRをモデル化し、繰り返し単位数をカウントし、配列変異を検出する手法であるadVNTRについて述べる。

Documentation

Quick Start — adVNTR 1.1.1 documentation

インストール

mac10.14でpython2.7の仮想環境を作ってテストした。

本体　Github

#bioconda (link)
conda creatte -n advntr-env python=2.7
conda activate advntr-env
conda install -c bioconda -y advntr

> advntr -h

$ advntr -h

=======================================================

adVNTR 1.3.3: Genopyting tool for VNTRs

=======================================================

Source code: https://github.com/mehrdadbakhtiari/adVNTR

Instructions: http://advntr.readthedocs.io

-------------------------------------------------------

usage: advntr <command> [options]

Command: genotype find RU counts and mutations in VNTRs

viewmodel view existing models in database

addmodel add custom VNTR to the database

delmodel remove a model from database

advntr: error: too few arguments

> advntr genotype -h

advntr genotype -h

usage: advntr genotype [options]

Input/output options:

-a/--alignment_file <file> alignment file in SAM/BAM/CRAM format

-r/--reference_filename <file> path to a FASTA-formatted reference file for CRAM files. It overrides

filename specified in header, which is normally used to find the reference

-f/--fasta <file> Fasta file containing raw reads

-p/--pacbio set this flag if input file contains PacBio reads instead of Illumina reads

-n/--nanopore set this flag if input file contains Nanopore MinION reads instead of

Illumina

-o/--outfile <file> file to write results. adVNTR writes output to stdout if oufile is not

specified.

-of/--outfmt <format> output format. Allowed values are {text, bed} [text]

Algorithm options:

-fs/--frameshift set this flag to search for frameshifts in VNTR instead of copy number.

Supported VNTR IDs: [25561, 519759]

-e/--expansion set this flag to determine long expansion from PCR-free data

-c/--coverage <float> average sequencing coverage in PCR-free sequencing

--haploid set this flag if the organism is haploid

-naive/--naive use naive approach for PacBio reads

Other options:

-h/--help show this help message and exit

--working_directory <path> working directory for creating temporary files needed for computation

-m/--models <file> VNTR models file [vntr_data/hg19_selected_VNTRs_Illumina.db]

-t/--threads <int> number of threads [1]

-u/--update set this flag to iteratively update the model

-vid/--vntr_id <text> comma-separated list of VNTR IDs

テストラン

ランにはトレーニング済みのモデルか、ユーザーがトレーニングしたモデルを用意する必要がある。

#pre-trained model
wget https://cseweb.ucsd.edu/~mbakhtia/adVNTR/vntr_data_recommended_loci.zip

> ls -l vntr_data/

$ ls -l vntr_data/

total 44280

-rw-r--r-- 1 kazuma staff 9207808 3 22 2019 hg19_selected_VNTRs_Illumina.db

-rw-r--r-- 1 kazuma staff 13463552 3 22 2019 hg19_selected_VNTRs_Pacbio.db

#bam
https://cseweb.ucsd.edu/~mbakhtia/adVNTR/quickstart/

bamを指定して実行する。解凍したモデルのvntr_data/は自動で認識される。

mkdir log_dir
advntr genotype --vntr_id 301645 --alignment_file CSTB_2_5_testdata.bam --working_directory ./log_dir/

引用

Targeted genotyping of variable number tandem repeats with adVNTR
Mehrdad Bakhtiari, Sharona Shleizer-Burko, Melissa Gymrek, Vikas Bansal, Vineet Bafna

Genome Research, 28(11), pp.1709-1719