tumorサンプルのテロメアリピート数を推定する telomerehunter

2020 4/20 誤字修正

　テロメアは、真核生物の染色体の末端にある核タンパク質の複合体である。ヒトでは、テロメアDNAは主にノンコーディングのt型(TTAGGG)リピートで構成されているが、c型(TCAGGG)、g型(TGAGGG)、j型(TTGGGG)リピートで構成されています。しかし、c- (TCAGGG)、g- (TGAGGG)、j- (TTGGGG)のテロメア可変リピート(TVR)や他の6量体配列のバリエーションも存在する[ref.1,2,3]。テロメアは細胞分裂のたびに短くなり [ref.4]、臨界テロメア長に達すると、DNA損傷反応が誘発され、細胞の老化またはアポトーシスを引き起こす [ref.5, 6]。

　細胞分裂数の限定を回避するため、腫瘍はテロメア維持機構（TMM）としてテロメラーゼの活性化[ref.7]またはalternative lengthening of telomeres（ALT）[ref.8](参考)を採用している。テロメラーゼは、染色体末端にt型リピートを付加する酵素である[ref.9]。対照的に、ALTはテロメア領域の組換えに基づいており、不均一な長さのテロメア[ref.8]および配列構成[ref.3、10]を含むいくつかの特徴をもたらす。

　これらのTMMは腫瘍形成に極めて重要であり、がん治療のための貴重な創薬標的となっている[ref.11]。しかし、様々なタイプの腫瘍におけるこれらのメカニズムを正確に同定し、それを阻害するためには、異なるテロメア構造についてのより多くの洞察が必要である。過去数十年の間に、テロメアの長さと ALT の状態を評価するためのいくつかの実験的方法、例えばテロメア qPCR、末端制限断片（TRF）分析、C サークルアッセイなどが確立されてきた [ref.12, 13]。

ハイスループットシーケンシングの進歩に伴い、テロメア含量を測定する代替方法が登場した。いくつかの研究では、全ゲノムシークエンシング（WGS）データのテロメアリピートを含むショートリード数がテロメア含量を推定するために使用できることが示され、確立された実験的方法と同等の結果が得られている[ref.10, 14, 15, 16, 17, 18]。この種の解析は、最近発表されたいくつかのがん研究[19,20,21]で説明されているように、がんデータにおけるテロメアの特徴についての貴重な洞察をもたらす。ここでは、一致した腫瘍とコントロールペアのために特別に設計されたテロメア含有量を決定するための新しい計算ツール、TelomereHunterを紹介する。既存のツールとは対照的に、TelomereHunterはアラインメント情報を考慮に入れ、テロメア配列中のバリアントリピートの豊富さを報告する。この論文では、TelomereHunterの主な機能を紹介し、ALT陽性およびALT陰性の腫瘍サンプルにおける例示的な結果の解釈について議論し、テロメア含有量推定のための生物学的アッセイと比較してツールを特徴づけ、異なるシーケンスプロトコルがテロメア含有量の定量化に与える影響を評価する。

https://www.dkfz.de/en/applied-bioinformatics/telomerehunter/telomerehunter.html

インストール

ubuntu18.04のpython2.7環境でテストした（docker使用、ホストmacos10.14）。

依存

operating system: Linux

for telomere read extraction and calculation of telomere content

python 2.7.9 (does not work for python 3!)
pysam 0.9.0
PyPDF2 1.26.0
samtools 1.3.1

for visualization

R 3.3.0
ggplot2 2.1.0
reshape2 1.4.1
gridExtra 2.2.1
RColorBrewer 1.1-2
cowplot 0.9.2
svglite 1.2.1

#pip (PyPI) 視覚化のためのRのパッケージは別途導入する必要あり。
pip install telomerehunter

> telomerehunter -h

$ telomerehunter -h

This program comes with ABSOLUTELY NO WARRANTY.

This is free software, and you are welcome to redistribute it

under certain conditions. For details see the GNU General Public License

in the license copy received with TelomereHunter or <http://www.gnu.org/licenses/>.

TelomereHunter 1.1.0

usage: telomerehunter [-h] [-ibt TUMOR_BAM] [-ibc CONTROL_BAM] -o OUTPUT_DIR

-p PID [-b BANDING_FILE] [-rt REPEAT_THRESHOLD_SET]

[-rl] [-mqt MAPQ_THRESHOLD] [-d]

[-r REPEATS [REPEATS ...]] [-con] [-gc1 LOWERGC]

[-gc2 UPPERGC] [-nf]

[-rc TVRS_FOR_CONTEXT [TVRS_FOR_CONTEXT ...]]

[-bp BP_CONTEXT] [-pl] [-pff {pdf,png,svg,all}] [-p1]

[-p2] [-p3] [-p4] [-p5] [-p6] [-p7] [-p8] [-prc]

Estimation of telomere content from WGS data of a tumor and/or a control

sample.

optional arguments:

-h, --help show this help message and exit

-ibt TUMOR_BAM, --inputBamTumor TUMOR_BAM

Path to the indexed input BAM file of the tumor

sample.

-ibc CONTROL_BAM, --inputBamControl CONTROL_BAM

Path to the indexed input BAM file of the control

sample.

-o OUTPUT_DIR, --outPath OUTPUT_DIR

Path to the output directory into which all results

are written.

-p PID, --pid PID Sample name used in output files and diagrams

(required).

-b BANDING_FILE, --bandingFile BANDING_FILE

Path to a tab-separated file with information on

chromosome banding. The first four columns of the

table have to contain the chromosome name, the start

and end position and the band name. The table should

not have a header. If no banding file is specified,

the banding information of hg19 will be used.

-rt REPEAT_THRESHOLD_SET, --repeatThreshold REPEAT_THRESHOLD_SET

The number of repeats needed for a read to be

classified as telomeric. If no repeat threshold is

defined, TelomereHunter will calculate the

repeat_threshold depending on the read length with the

following formula: repeat_threshold =

floor(read_length * 6/100)

-rl, --perReadLength Repeat threshold is set per 100 bp read length. The

used repeat threshold will be: floor(read_length *

repeat_threshold/100) E.g. Setting -rt 8 -rl means

that 8 telomere repeats are required per 100 bp read

length. If the read length is 50 bp, the threshold is

set to 4.

-mqt MAPQ_THRESHOLD, --mappingQualityThreshold MAPQ_THRESHOLD

The mapping quality needed for a read to be considered

as mapped (default = 8).

-d, --removeDuplicates

Reads marked as duplicates in the input bam file(s)

are removed in the filtering step.

-r REPEATS [REPEATS ...], --repeats REPEATS [REPEATS ...]

List of telomere repeat types to search for. Reverse

complements are automatically generated and do not

need to be specified! By default, TelomereHunter

searches for t-, g-, c- and j-type repeats (TTAGGG

TGAGGG TCAGGG TTGGGG).

-con, --consecutive Search for consecutive repeats.

-gc1 LOWERGC, --lowerGC LOWERGC

Lower limit used for GC correction of telomere

content. The value must be an integer between 0 and

100 (default = 48).

-gc2 UPPERGC, --upperGC UPPERGC

Upper limit used for GC correction of telomere

content. The value must be an integer between 0 and

100 (default = 52).

-nf, --noFiltering If the filtering step of TelomereHunter has already

been run previously, skip this step.

-rc TVRS_FOR_CONTEXT [TVRS_FOR_CONTEXT ...], --repeatsContext TVRS_FOR_CONTEXT [TVRS_FOR_CONTEXT ...]

List of telomere variant repeats for which to analyze

the sequence context. Reverse complements are

automatically generated and do not need to be

specified! Counts for these telomere variant repeats

(arbitrary and singleton context) will be added to the

summary table. Default repeats: TCAGGG TGAGGG TTGGGG

TTCGGG TTTGGG ATAGGG CATGGG CTAGGG GTAGGG TAAGGG).

-bp BP_CONTEXT, --bpContext BP_CONTEXT

Number of base pairs on either side of the telomere

variant repeat to investigate. Please use a number

that is divisible by 6.

-pl, --parallel The filtering, sorting and estimating steps of the

tumor and control sample are run in parallel. This

will speed up the computation time of TelomereHunter.

-pff {pdf,png,svg,all}, --plotFileFormat {pdf,png,svg,all}

File format of output diagrams. Choose from pdf

(default), png, svg or all (pdf, png and svg).

-p1, --plotChr Make diagrams with telomeric reads mapping to each

chromosome. If none of the options p1/p2/p3/p4/p5/p6

are chosen, all diagrams will be created.

-p2, --plotFractions Make a diagram with telomeric reads in each fraction

(intrachromosomal, subtelomeric, junction spanning,

intratelomeric). If none of the options

p1/p2/p3/p4/p5/p6 are chosen, all diagrams will be

created.

-p3, --plotTelContent

Make a diagram with the gc corrected telomere content

in the analyzed samples. If none of the options

p1/p2/p3/p4/p5/p6 are chosen, all diagrams will be

created.

-p4, --plotGC Make a diagram with GC content distributions in all

reads and in intratelomeric reads. If none of the

options p1/p2/p3/p4/p5/p6 are chosen, all diagrams

will be created.

-p5, --plotRepeatFreq

Make histograms of the repeat frequencies per

intratelomeric read. If none of the options

p1/p2/p3/p4/p5/p6 are chosen, all diagrams will be

created.

-p6, --plotTVR Make plots for telomere variant repeats.

-p7, --plotSingleton Make plots for singleton telomere variant repeats.

-p8, --plotNone Do not make any diagrams. If none of the options

p1/p2/p3/p4/p5/p6/p7/p8 are chosen, all diagrams will

be created.

-prc, --plotRevCompl Distinguish between forward and reverse complement

telomere repeats in diagrams.

Contact Lina Sieverling (l.sieverling@dkfz-heidelberg.de) for questions and

support.

dockerイメージのビルド

git clone https://github.com/cancerit/telomerehunter-docker.git
cd telomerehunter-docker/
docker build -t telomerehunter .

実行方法

tumorとcontrol（任意）のbamを指定する。

telomerehunter -p prefix -o outdir -ibt tumor.bam -ibc control.bam

-ibt Path to the indexed input BAM file of the tumor sample.
-ibc Path to the indexed input BAM file of the control sample.
-o Path to the output directory into which all results are written.
-p Sample name used in output files and diagrams (required).

引用

TelomereHunter – in silico estimation of telomere content and composition from cancer genomes

Lars Feuerbach, Lina Sieverling, Katharina I. Deeg, Philip Ginsbach, Barbara Hutter, Ivo Buchhalter, Paul A. Northcott, Sadaf S. Mughal, Priya Chudasama, Hanno Glimm, Claudia Scholl, Peter Lichter, Stefan Fröhling, Stefan M. Pfister, David T. W. Jones, Karsten Rippe, Benedikt Brors
BMC Bioinformatics volume 20, Article number: 272 (2019)

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

tumorサンプルのテロメアリピート数を推定する telomerehunter