NanoSim-Hは NanoSim(紹介)のforkとして開発されたONTリードのシミュレータ。以下の改善点がある。
- Support for Python 3
- Support for RNF read names
- Installation from PyPI
- Error profiles distributed with the main package
- Automatic testing using Travis
- Reproducible simulations (setting a seed for PRG)
- Improved interface with new parameters (e.g., for merging all contigs) and a progress bar
- Several minor bugs fixed
エラープロファイルとしてecoli_R7、ecoli_R7.3, ecoli_R9_1D、ecoli_R9_2D、その他が準備されており、トレーニングなしにすぐに使い始めることができる。出力はfastaになり、quality情報は出ない。
インストール
依存
本体 Github
pipやcondaで導入できる。
pip install --upgrade nanosim-h
conda install -c bioconda -y nanosim-h
> nanosim-h --help
$ nanosim-h --help
usage: nanosim-h [-h] [-p str] [-o str] [-n int] [-u float] [-m float]
[-i float] [-d float] [-s int] [--circular] [--perfect]
[--merge-contigs] [--rnf] [--rnf-add-cigar] [--max-len int]
[--min-len int] [--kmer-bias int]
<reference.fa>
Program: NanoSim-H - a simulator of Oxford Nanopore reads.
Version: 1.1.0.3
Authors: Chen Yang <cheny@bcgsc.ca> - author of the original software package (NanoSim)
Karel Brinda <kbrinda@hsph.harvard.edu> - author of the NanoSim-H fork
positional arguments:
<reference.fa> reference genome (- for standard input)
optional arguments:
-h, --help show this help message and exit
-p str, --profile str
error profile - one of precomputed profiles
('ecoli_R7', 'ecoli_R7.3', 'ecoli_R9_1D',
'ecoli_R9_2D', 'ecoli_UCSC1b', 'yeast') or own
directory with an error profile [ecoli_R9_2D]
-o str, --out-pref str
prefix of output file [simulated]
-n int, --number int number of generated reads [10000]
-u float, --unalign-rate float
rate of unaligned reads [detect from the error
profile]
-m float, --mis-rate float
mismatch rate (weight tuning) [1.0]
-i float, --ins-rate float
insertion rate (weight tuning) [1.0]
-d float, --del-rate float
deletion rate (weight tuning) [1.0]
-s int, --seed int initial seed for the pseudorandom number generator (0
for random) [42]
--circular circular simulation (linear otherwise)
--perfect output perfect reads, no mutations
--merge-contigs merge contigs from the reference
--rnf use RNF format for read names
--rnf-add-cigar add cigar to RNF names (not fully debugged, yet)
--max-len int maximum read length [inf]
--min-len int minimum read length [50]
--kmer-bias int prohibits homopolymers with length >= n bases in
output reads [6]
Examples: nanosim-h --circular ecoli_ref.fasta
nanosim-h --circular --perfect ecoli_ref.fasta
nanosim-h -p yeast --kmer-bias 0 yeast_ref.fasta
Notice: the use of `max-len` and `min-len` will affect the read length distributions. If
the range between `max-len` and `min-len` is too small, the program will run slowlier accordingly.
ラン
R9_1Dのエラープロファイルで環状クロモソームをシミュレート(複数ヘッダがあると動作しない)。
nanosim-h --circular -n 10000 -p ecoli_R9_1D\
Ecoli_ref.fasta -o simulated
- --perfect output perfect reads, no mutations
- -p <str> one of precomputed profiles ('ecoli_R7', 'ecoli_R7.3', 'ecoli_R9_1D', 'ecoli_R9_2D', 'ecoli_UCSC1b', 'yeast') or own directory with an error profile [ecoli_R9_2D]
- -o prefix of output file [simulated]
- -n number of generated reads [10000]
perfectをつけるとエラーフリーになる。
fastaとともに出力される~errors.txtには、エラー部位が出力されている。
$ head simulated.errors.txt
Seq_name Seq_pos error_type error_length ref_base seq_base
chr_1691692_aligned_588 6860 del 1 T -
chr_1691692_aligned_588 6833 ins 2 -- AC
chr_1691692_aligned_588 6828 mis 1 G A
chr_1691692_aligned_588 6825 mis 1 T A
chr_1691692_aligned_588 6820 ins 2 -- CT
chr_1691692_aligned_588 6817 ins 2 -- GC
chr_1691692_aligned_588 6816 mis 1 C A
chr_1691692_aligned_588 6806 ins 1 - T
chr_1691692_aligned_588 6800 del 1 T -
上記以外のエラープロファイルを利用したい場合、nanosim-h-trainコマンドを利用する(LASTとRが必要になる)。詳細はgithubで確認してください。
引用
https://github.com/karel-brinda/nanosim-h
How to simulate nanopore reads?
https://bioinformatics.stackexchange.com/questions/5259/how-to-simulate-nanopore-reads