Pomoxisは、ナノポアシーケンスに合わせて調整された基本的なバイオインフォマティクスツールのセット。ドラフトアセンブリの生成分析に関するツールが含まれている。 これらのツールの多くは、Oxford Nanopore Technologiesの研究データ分析グループによって使用されている。
Document
Pomoxis - bioinformatics tools for nanopore research — Pomoxis 0.2.2 documentation
特徴(documentより)
- Wraps third party tools with known good default parameters and methods of use.
- Creates an isolated environment with all third-party tools.
- Can be installed with conda.
- Streamlines common short analysis chains.
- Includes a nanopore read simulator.
- Server/client components for minimap2 and bwa.
- Integrates into katuali for performing more complex analysis pipelines.
- Open source (Mozilla Public License 2.0).
インストール
依存
Installation from source
- gcc-4.9
- g++-4.9
- zlib1g-dev
- libncurses5-dev
- python3-all-dev
- libhdf5-dev
- libatlas-base-dev
- libopenblas-base
- libopenblas-dev
- libbz2-dev
- liblzma-dev
- libffi-dev
- make
- python-virtualenv
- cmake (for racon)
- wget (for fetching modules from github)
- bzip2 (for extracting those modules)
requires the user to provide several binaries:
- minimap2
- miniasm
- racon
- samtools
- bcftools
- seqkit
本体 Github
#virtualenvを使用
virtualenv pomoxis --python=python3 --prompt "(pomoxis) "
. pomoxis/bin/activate
pip install git+https://github.com/rrwick/Porechop
pip install pomoxis
#またはvirtualenvを使わずpipで導入
pip install git+https://github.com/rrwick/Porechop
pip install pomoxis
> mini_align
$ mini_align
mini_align [-h] -r <reference> -i <fastq>
Align fastq/a formatted reads to a genome using minimap2.
-h show this help text.
-r reference, should be a fasta file. If correspondng minimap indices
do not exist they will be created. (required).
-i fastq/a input reads (required).
-I split index every ~NUM input bases (default: 16G, this is larger
than the usual minimap2 default).
-a aggressively extend gaps (sets -A1 -B2 -O2 -E1 for minimap2).
-P filter to only primary alignments (i.e. run samtools view -F 2308.
Deprecated: this filter is now default and can be disabled with -A.)
-A do not filter alignments to primary alignments, output all.
-n sort bam by read name.
-c chunk size. Input reads/contigs will be broken into chunks
prior to alignment.
-t alignment threads (default: 1).
-p output file prefix (default: reads).
-m fill MD tag.
-i and -r must be specified.
> mini_assemble
$ mini_assemble
mini_assemble [-h] -i <fastq>
Assemble fastq/fasta formatted reads and perform POA consensus.
-h show this help text.
-i fastx input reads (required).
-r reference fasta for reference-guided consensus (instead of de novo assembly)
-o output folder (default: assm).
-p output file prefix (default: reads).
-t number of minimap and racon threads (default: 1).
-m number of racon rounds (default: 4).
-n number of racon shuffles (default: 1. If not 1, should be >10).
-w racon window length (default: 500).
-k keep intermediate files (default: delete).
-K minimap's -K option (default: 500M).
-c trim adapters from reads prior to everything else.
-e error correct longest e% of reads prior to assembly, or an estimated coverage (e.g. 2x).
For an estimated coverage, the -l option must be a fastx or a length (e.g. 4.8mb).
-l Reference length, either as a number (e.g. 4.8mb) or fastx from which length will be calculated.
-x log all commands before running.
-i must be specified.
> assess_assembly
$ assess_assembly
assess_assembly [-h] -r <reference> -i <fastq>
Calculate accuracy statistics for an assembly.
-h show this help text.
-r reference, should be a fasta file. If correspondng bwa indices
do not exist they will be created. (required).
-i fastq/a input assembly (required).
-c chunk size. Input reads/contigs will be broken into chunks
prior to alignment, 0 will not chunk (default 100000).
-C catalogue errors.
-t alignment threads (default: 1).
-p output file prefix (default: assm).
-T trim consensus to primary alignments of truth to assembly.
-b .bed file of reference regions to assess.
-i and -r must be specified.
> simulate_calls -h
$ simulate_calls -h
usage: simulate_calls [-h] [--mu MU] [--sigma SIGMA] [--noise NOISE]
[--threads THREADS]
fasta ncalls
Simulate basecalls with scrappy.
positional arguments:
fasta Source sequence file.
ncalls Number of basecalls to produce.
optional arguments:
-h, --help show this help message and exit
--mu MU mean fragment length.
--sigma SIGMA stdv fragment length.
--noise NOISE Additional Gaussian noise on signal.
--threads THREADS number of worker threads.
> subsample_bam
$ subsample_bam
usage: subsample bam to uniform or proportional depth [-h] [-o OUTPUT_PREFIX]
[-r REGIONS [REGIONS ...]]
[-p PROFILE]
[-O {fwd,rev}]
[-t THREADS]
[-q QUALITY]
[-a ACCURACY]
[-c COVERAGE]
[--any_fail | --all_fail]
[-x PATIENCE]
[-s STRIDE] [-P]
[-S SEED]
bam depth [depth ...]
subsample bam to uniform or proportional depth: error: the following arguments are required: bam, depth
実行方法
1、mini_align
マッピング(minimap2)。fastqはgzip圧縮(.gz)にも対応している。
mini_align -t 20 -r ref.fa -i input.fq
- -r reference, should be a fasta file. If correspondng minimap indices do not exist they will be created. (required)
- -t alignment threads (default: 1)
- -i fastq/a input reads (required)
出力はcoordinate sortされたbamとbam.baiになる。
2、mini_assemble
de novo アセンブリ(miniasmとminimap)とコンセンサスコール(racon x 4 times)
mini_assemble -t 20 -e 10 -l 4.7m -m 4 -i input.fq
- -i fastx input reads (required)
- -t number of minimap and racon threads (default: 1)
-
-c trim adapters from reads prior to everything else.
-
-e error correct longest e% of reads prior to assembly, or an estimated coverage (e.g. 2x). For an estimated coverage, the -l option must be a fastx or a length (e.g. 4.8mb)
-
-l Reference length, either as a number (e.g. 4.8mb) or fastx from which length will be calculated
注: テスト時はpolish off(-m 0)にしないと出力がゼロになった。polishが失敗しているにもかかわらず作業ファイル(draft assembly)を消しているからかもしれない。
リファレンスガイドアセンブリとコンセンサスコール(racon)
mini_assemble -t 20 -e 10 -l 4.7m -m 4 -r ref.fa -i input.fq
- -r reference fasta for reference-guided consensus (instead of de novo assembly)
未テスト
3、assess_assembly
アセンブリ評価
assess_assembly -t 20 -r assembly.fa -i input.fq
# Percentage Errors
name mean q10 q50 q90
err_ont 16.926% 13.544% 16.310% 20.197%
err_bal 17.925% 14.164% 17.221% 21.615%
iden 6.706% 5.190% 6.558% 8.406%
del 5.699% 4.222% 5.305% 7.134%
ins 5.903% 4.259% 5.470% 7.339%
# Q Scores
name mean q10 q50 q90
err_ont 7.71 8.68 7.88 6.95
err_bal 7.47 8.49 7.64 6.65
iden 11.74 12.85 11.83 10.75
del 12.44 13.74 12.75 11.47
ins 12.29 13.71 12.62 11.34
標準ではSTDOUT出力される。
4、simulate_calls
oxford nanoporeのロングリードのシミュレーション
refe.faを鋳型にして10000リード発生させる。平均リード長は10000とする。
assess_assembly --threads 20 --mu 10000 refe.fa 10000
- --mu mean fragment length
重たいので使えるだけスレッド指定した方が良いです。
出力
内部で実際に動いているツールとパラメータについては、それぞれの実行用bashスクリプトを開いて確認して下さい。
引用
Pomoxis - bioinformatics tools for nanopore research
© 2018 Oxford Nanopore Technologies Ltd
関連