2020 5/25 タイトル修正
2020 11/11 dockerリンク追加
現在知られているウイルスのセットは着実に拡大しているが、これまでのところ、地球上のウイルスのごく一部しかシークエンシングされていない。ショットガンメタゲノムシークエンシングは、新しいウイルスを明らかにする機会を提供するが、メタゲノムアセンブリでは検出が困難なウイルスゲノムを特定するという計算上の課題に直面している。
本研究では、ウイルスと細菌のクロモソーム間のカバレッジデプスのばらつきを分析することに基づいて、メタゲノムアセンブリグラフ中のウイルスゲノムを同定するためのmetaviralSPAdesツールについて説明する。多様なメタゲノムデータセット上でmetaviralSPAdesをベンチマークし、ウイルス固有の隠れマルコフモデルのセットを用いて我々の予測を検証し、最先端のウイルス同定パイプラインを改善することを実証した。
metaviralSPAdes には viralAssembly、viralVerify、viralComplete モジュールが含まれており、スタンドアロンパッケージとしてhttps://github.com/ablab/spades/tree/metaviral_publication、 https://github.com/ablab/viralVerify/、 https://github.com/ablab/viralComplete/ で利用できる。
インストール
ubuntu18.04でテストした。
cd spades/assembler/
./spades_compile.sh
> ./metaviralspades.py
$ ./metaviralspades.py
SPAdes genome assembler v3.14.0-dev [metaplasmidSPAdes mode]
Usage: spades.py [options] -o <output_dir>
Basic options:
-o <output_dir> directory to store all the resulting files (required)
--iontorrent this flag is required for IonTorrent data
--test runs SPAdes on toy dataset
-h, --help prints this usage message
-v, --version prints version
Input data:
--12 <filename> file with interlaced forward and reverse paired-end reads
-1 <filename> file with forward paired-end reads
-2 <filename> file with reverse paired-end reads
-s <filename> file with unpaired reads
--merged <filename> file with merged forward and reverse paired-end reads
--pe-12 <#> <filename> file with interlaced reads for paired-end library number <#>.
Older deprecated syntax is -pe<#>-12 <filename>
--pe-1 <#> <filename> file with forward reads for paired-end library number <#>.
Older deprecated syntax is -pe<#>-1 <filename>
--pe-2 <#> <filename> file with reverse reads for paired-end library number <#>.
Older deprecated syntax is -pe<#>-2 <filename>
--pe-s <#> <filename> file with unpaired reads for paired-end library number <#>.
Older deprecated syntax is -pe<#>-s <filename>
--pe-m <#> <filename> file with merged reads for paired-end library number <#>.
Older deprecated syntax is -pe<#>-m <filename>
--pe-or <#> <or> orientation of reads for paired-end library number <#>
(<or> = fr, rf, ff).
Older deprecated syntax is -pe<#>-<or>
--s <#> <filename> file with unpaired reads for single reads library number <#>.
Older deprecated syntax is --s<#> <filename>
--mp-12 <#> <filename> file with interlaced reads for mate-pair library number <#>.
Older deprecated syntax is -mp<#>-12 <filename>
--mp-1 <#> <filename> file with forward reads for mate-pair library number <#>.
Older deprecated syntax is -mp<#>-1 <filename>
--mp-2 <#> <filename> file with reverse reads for mate-pair library number <#>.
Older deprecated syntax is -mp<#>-2 <filename>
--mp-s <#> <filename> file with unpaired reads for mate-pair library number <#>.
Older deprecated syntax is -mp<#>-s <filename>
--mp-or <#> <or> orientation of reads for mate-pair library number <#>
(<or> = fr, rf, ff).
Older deprecated syntax is -mp<#>-<or>
--hqmp-12 <#> <filename> file with interlaced reads for high-quality mate-pair library number <#>.
Older deprecated syntax is -hqmp<#>-12 <filename>
--hqmp-1 <#> <filename> file with forward reads for high-quality mate-pair library number <#>.
Older deprecated syntax is -hqmp<#>-1 <filename>
--hqmp-2 <#> <filename> file with reverse reads for high-quality mate-pair library number <#>.
Older deprecated syntax is -hqmp<#>-2 <filename>
--hqmp-s <#> <filename> file with unpaired reads for high-quality mate-pair library number <#>.
Older deprecated syntax is -hqmp<#>-s <filename>
--hqmp-or <#> <or> orientation of reads for high-quality mate-pair library number <#>
(<or> = fr, rf, ff).
Older deprecated syntax is -hqmp<#>-<or>
--nxmate-1 <#> <filename> file with forward reads for Lucigen NxMate library number <#>.
Older deprecated syntax is -nxmate<#>-1 <filename>
--nxmate-2 <#> <filename> file with reverse reads for Lucigen NxMate library number <#>.
Older deprecated syntax is -nxmate<#>-2 <filename>
--sanger <filename> file with Sanger reads
--pacbio <filename> file with PacBio reads
--nanopore <filename> file with Nanopore reads
--tslr <filename> file with TSLR-contigs
--trusted-contigs <filename>
file with trusted contigs
--untrusted-contigs <filename>
file with untrusted contigs
Pipeline options:
--only-error-correction runs only read error correction (without assembling)
--only-assembler runs only assembling (without read error correction)
--careful tries to reduce number of mismatches and short indels
--checkpoints <last or all>
save intermediate check-points ('last', 'all')
--continue continue run from the last available check-point
--restart-from <cp> restart run with updated options and from the specified check-point
('ec', 'as', 'k<int>', 'mc', 'last')
--disable-gzip-output forces error correction not to compress the corrected reads
--disable-rr disables repeat resolution stage of assembling
Advanced options:
--dataset <filename> file with dataset description in YAML format
-t <int>, --threads <int> number of threads. [default: 16]
-m <int>, --memory <int> RAM limit for SPAdes in Gb (terminates if exceeded). [default: 250]
--tmp-dir <dirname> directory for temporary files. [default: <output_dir>/tmp]
-k <int> [<int> ...] list of k-mer sizes (must be odd and less than 128)
[default: 'auto']
--cov-cutoff <float> coverage cutoff value (a positive float number, or 'auto', or 'off')
[default: 'off']
--phred-offset <33 or 64> PHRED quality offset in the input reads (33 or 64),
[default: auto-detect]
公式ではないがdockerイメージもアップされている。
#dockerhub(link)
docker pull nakor/metaviralspades:latest
docker run --rm -itv $PWD:/data/ -w /data nakor/metaviralspades metaviralspades.py -1 /data/pair1.fq /data/pair2.fq -o /data/output_dir
実行方法
メタゲノムシーケンシング リードを指定する。
./metaviralspades.py -1 input_1.fq.gz -2 input_2.fq.gz -o outdir -t 12
引用
metaviralSPAdes: assembly of viruses from metagenomic data
Dmitry Antipov, Mikhail Raiko, Alla Lapidus, Pavel A Pevzner
Bioinformatics, Published: 15 May 2020 Article history
関連