ナノポアのロングリードの長さやクオリティを分析する nanoQC

この論文ではOxford Nanopore TechnologiesとPacific Biosciencesのロングリードシーケンスデータの可視化と処理のために開発されたツールセット、NanoPackについて説明する。NanoPackツールはPython 3で書かれており、GNU GPL3.0ライセンスの下でリリースされている。ソースコードはhttps://github.com/wdecoster/nanopackにあり、別々のスクリプトへのリンクとそのドキュメントもある。スクリプトは、Linux、Mac OS、およびLinux用のMS Windows 10サブシステムと互換性があり、グラフィカルユーザーインターフェース、http://nanoplot.bioinf.beのWebサービス、およびコマンドラインツールとして利用できる。

Base callingのレポートファイルを持っているなら、オンラインでも利用できる。

http://nanoplot.bioinf.be

f:id:kazumaxneo:20190227191838p:plain

リンク先のウィンドウ内にAlbacore/Guppyが出力するsequencing_summary.txt.ファイルをドラッグ&ドロップする（上限100MB）。

ローカルマシンへのインストール

mac os10.14のanaconda3-5.1.0環境でテストした。

本体　GIthub

#anaconda環境ならcondaで導入できる
conda install -y -c bioconda nanoQC

#またはpipを使う
pip install nanoQC

> nanoQC -h

$ nanoQC -h

usage: nanoQC [-h] [-v] [-o OUTDIR] [-l MINLEN] fastq

Investigate nucleotide composition and base quality.

positional arguments:

fastq Reads data in fastq.gz format.

optional arguments:

-h, --help show this help message and exit

-v, --version Print version and exit.

-o OUTDIR, --outdir OUTDIR

Specify directory in which output has to be created.

-l MINLEN, --minlen MINLEN

Filters the reads on a minimal length of the given

range. Also plots the given length/2 of the begin and

end of the reads.

実行方法

nanoQC input.fastq -o OUTDIR

出力

f:id:kazumaxneo:20190227194229p:plain

f:id:kazumaxneo:20190227194306p:plain

結果はBokehを使いインタラクティブなグラフとして可視化される。

引用

NanoPack: visualizing and processing long-read sequencing data
Wouter De Coster, Svenn D’Hert, Darrin T Schultz, Marc Cruts, Christine Van Broeckhoven
Bioinformatics, Volume 34, Issue 15, 1 August 2018, Pages 2666–2669