2021 8/20 追記
Tangramはトランスポゾンの検出に特化した構造変化検出ツール。SV検出で用いられるread-pairとsplit-readのアルゴリズムを使い高感度にトランスポゾンを検出する。1000ゲノムでもmobile element検出ツールとして用いられた。トランスポゾン検出ツールは様々報告されているが、Tangramはsplit read情報とread pair情報を両方使い、1塩基の精度でトランスポゾン挿入位置を検出することが可能である。
インストール
依存
- g++ 4.2.0 and above
- zlib
- pthread lib
ダウンロードしてビルドする。
git clone git://github.com/jiantao/Tangram.git
cd Tangram/src/
make -j
>./tangram_scan
$ ./tangram_scan
Usage: tangram_scan [options] -in <input_file_list> -dir <output_dir>
Mandatory arguments: -in FILE the list of input bam files
-dir STRING the path to the output dir (must be empty or non-existing)
Options: -cf FLOAT threashold for normal read pair in the fragment length distribution[0.01 total for both side]
-tr FLOAT trim rate for the fragment length distribution[0.02 total for both side]
-mq INT minimum mapping quality for a normal read pair
-mf INT minimum number of nomral fragments in a library[10000]
-help print this help message
> /tangram_bam
$ ./tangram_bam
Usage: tangram_bam [options] -i <in_bam> -r <ref_fa> -o <out_bam>
Mandatory arguments:
-i --input FILE The input of bam file [stdin].
-r --ref FILE The input of special reference file.
-o --output FILE The output of bam file [stdout].
Options:
-h --help Print this help message.
-t --target-ref-name STRING Chromosome region.
-m --required-match INT The number of required matches.
between reads and special references [50].
Notes:
1. tangram_bam will add ZA tags that are required for the following detection.
> ./tangram_merge
$ ./tangram_merge
Usage: tangram_merge -dir <input_dir>
Mandatory arguments: -dir STRING the path to the dir contains all the fragment length distribution files
-help print this help message
> ./tangram_filter.pl
$ ./tangram_filter.pl
Usage: ./tangram_filter [options] --vcf <input_vcf> --msk <mask_input_list>
Mandatory Arguments: --vcf FILE input vcf file for filtering
--msk FILE input list of mask files with window size information
Options: --type STRING SV event type for filtering, choosing from "DEL", "DUP", "INV" and "MEI" (case insensitive) [MEI]
--rpf INT Minimum number of supporting fragments (reads) for read-pair events. For MEI events, this threshold
must be satisfied for reads from both 5' and 3' [2]
--srf INT Minimum number of supporting fragments (reads) for split-read events [2]
--out FILE Output of filtered and sorted VCF file [stdout]
--help Show this help message
Note:
1. This script require the installation of "bedtools" package and Unix
sort in the default directory.
2. Each entry of the list of mask files is a tab delimited file
with following format:
"TYPE WINDOW_SIZE FILE_NAME"
"TYPE" (string) is the type of this mask file. For a referenced MEI
mask file, it must match the first two characters of the family name
in the VCF file (For example AL: ALU, L1: L1, SV: SVA and HE: HERV).
This mask file will only be applied to the corresponding type of MEI
events. For example, AL mask file will only be applied to ALU insertions.
The rest of the mask files, such as segmental duplication
mask and simple repeat mask, their "TYPE" string can be anything and
it will be applied to all the entries in the VCF file. No space is allowed
in the type name.
"WINDOW_SIZE" (integer) is the window size around each entry of
the mask file.
"FILE_NAME" (string) is the path to the corresponding mask file
All the mask files must be in the BED format. For detailed information
about this format, please check http://genome.ucsc.edu/FAQ/FAQformat.html
bin/にパスを通しておく。
実行方法
以下のフローでトランスポゾンを検出する。
Gitのマニュアルより
MOSAIKでアライメントしたza tag付きのbamが必要。なければ、bamにZA -tagを付加する作業を最初に行う。fastaのindexがなければ、ここで自動作成される。
step1 ZA tagの付加
tangram_bam -i input.bam -r ref.fa -o ZA-tagged.bam
<@/&:MQ1:MQ2:SP_REF:NUM_MAP:CIGAR:MD>のようなTagがbamに付加される。詳細はGitのマニュアルを参考にしてください。
step2 bamのスキャン
tangram_scan -in ZA-tagged.bam -dir output
step3 index
tangram_index -ref -sp output/input.bam -out output/
step4 トランスポゾンの検出
tangram_detect
step5 フィルタリング
tangram_filter
引用
Tangram: a comprehensive toolbox for mobile element insertion detection
Jiantao Wu, Wan-Ping Lee, Alistair Ward, Jerilyn A Walker, Miriam K Konkel, Mark A Batzer and Gabor T Marth
BMC Genomics 2014 15:795
関連