2020 3/23 コマンドの間違いを修正
2020 3/24 説明追記
2020 10/10 ツイート追記
Documentation
We've release v1.1.2 of @nanopore's medaka software. Updates include: consensus model for Guppy 4.0.11, a true ploidy-1 variant caller, doesn't break contigs at unpolished regions, diploid calling option to produce candidate variants for DeepVariant, binaries for ARM.
— Chris Wright (@chrisnrg) 2020年10月7日
特徴
- basecallされたデータのみ必要(.fastaまたは.fastq)
- グラフベースのメソッド(Raconなど)よりも精度が向上
- Nanopolishよりも50倍高速(GPU実行できるため)
- オーダーメイドの補正ネットワーク実装とトレーニングのための追加機能
- オープンソース(Mozilla Public License 2.0)
インストール
Linuxで動作する。ここではbiocondaを使ってubuntu18.04LTSに導入した(docker使用、GPU未使用)。
- gcc
- zlib1g-dev
- libbz2-dev
- liblzma-dev
- libffi-dev
- libncurses5-dev
- make
- wget
- python3-all-dev
- python-virtualenv
本体 Github
#bioconda (link)
#ここでは仮想環境medaka-envにmedakaを導入する。
conda create -n medaka-env -y
conda activate medaka-env
conda install -c conda-forge -c bioconda -y medaka
#pip
pip install medaka
#To enable the use of GPU resource it is necessary to install the
#tensorflow-gpu package. In outline this can be achieve with:
pip uninstall tensorflow
pip install tensorflow-gpu
#note that The tensorflow-gpu GPU package is compiled against a specific version of the NVIDIA CUDA library; users are directed to the tensorflow installation pages for further information.
> medaka -h
$ medaka -h
usage: medaka [-h] [--version]
{compress_bam,features,train,consensus,smolecule,consensus_from_features,fastrle,stitch,variant,snp,methylation,tools}
...
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
subcommands:
valid commands
{compress_bam,features,train,consensus,smolecule,consensus_from_features,fastrle,stitch,variant,snp,methylation,tools}
additional help
compress_bam Compress an alignment into RLE form.
features Create features for inference.
train Train a model from features.
consensus Run inference from a trained model and alignments.
smolecule Create consensus sequences from single-molecule reads.
consensus_from_features
Run inference from a trained model on existing
features.
fastrle Create run-length encoded fastq (lengths in quality
track).
stitch Stitch together output from medaka consensus into
final output.
variant Decode probabilities to VCF.
snp Decode probabilities to SNPs.
methylation methylation subcommand.
tools tools subcommand.
様々なサブコマンドが利用できるが、ここではconsensusのみ記載。
> medaka consensus -h
$ medaka consensus -h
usage: medaka consensus [-h] [--debug | --quiet] [--batch_size BATCH_SIZE]
[--regions REGIONS [REGIONS ...]]
[--chunk_len CHUNK_LEN] [--chunk_ovlp CHUNK_OVLP]
[--model MODEL] [--disable_cudnn] [--threads THREADS]
[--check_output] [--save_features]
[--tag_name TAG_NAME] [--tag_value TAG_VALUE]
[--tag_keep_missing]
bam output
positional arguments:
bam Input alignments.
output Output file.
optional arguments:
-h, --help show this help message and exit
--debug Verbose logging of debug information. (default: 20)
--quiet Minimal logging; warnings only). (default: 20)
--batch_size BATCH_SIZE
Inference batch size. (default: 100)
--regions REGIONS [REGIONS ...]
Genomic regions to analyse, or a bed file. (default:
None)
--chunk_len CHUNK_LEN
Chunk length of samples. (default: 10000)
--chunk_ovlp CHUNK_OVLP
Overlap of chunks. (default: 1000)
--model MODEL Model definition, default is equivalent to
r941_min_high_g344. {r941_min_fast_g303,
r941_min_high_g303, r941_min_high_g330,
r941_min_high_g344, r941_prom_fast_g303,
r941_prom_high_g303, r941_prom_high_g344,
r941_prom_high_g330, r10_min_high_g303,
r10_min_high_g340, r103_min_high_g345,
r941_prom_snp_g303, r941_prom_variant_g303,
r941_min_high_g340_rle} (default:
/Users/kazu/anaconda3/envs/medaka-
env/lib/python3.6/site-
packages/medaka/data/r941_min_high_g344_model.hdf5)
--disable_cudnn Disable use of cuDNN model layers. (default: False)
--threads THREADS Number of threads used by inference. (default: 1)
--check_output Verify integrity of output file after inference.
(default: False)
--save_features Save features with consensus probabilities. (default:
False)
filter tag:
Filtering alignments by an integer valued tag.
--tag_name TAG_NAME Two-letter tag name. (default: None)
Value of tag. (default: None)
--tag_keep_missing Keep alignments when tag is missing. (default: False)
> medaka_consensus -h
$ medaka_consensus -h
medaka 0.11.5
------------
Assembly polishing via neural networks. The input assembly should be
preprocessed with racon.
medaka_consensus [-h] -i <fastx>
-h show this help text.
-i fastx input basecalls (required).
-d fasta input assembly (required).
-o output folder (default: medaka).
-m medaka model, (default: r941_min_high_g344).
Available: r941_min_fast_g303, r941_min_high_g303, r941_min_high_g330, r941_min_high_g344, r941_prom_fast_g303, r941_prom_high_g303, r941_prom_high_g344, r941_prom_high_g330, r10_min_high_g303, r10_min_high_g340, r103_min_high_g345, r941_prom_snp_g303, r941_prom_variant_g303, r941_min_high_g340_rle.
Alternatively a .hdf file from 'medaka train'.
-f Force overwrite of outputs (default will reuse existing outputs).
-t number of threads with which to create features (default: 1).
-b batchsize, controls memory use (default: 100).
実行方法
canuやminiasm+raconで作成したraw de novo aasemblyを入力とする。 oxford nanopoporeが想定しているのはraconでポリッシュしたアセンブリ配列となる。
medaka_consensus -i basecalled.fa -d draft-assembly.fa -o output
- -i fastx input basecalls (required).
- -d fasta input assembly (required).
- -o output folder (default: medaka).
- -m Model definition (default: r941_min_high_g344_model.hdf5)
結果は指定したディレクトリに出力される。
引用
medaka/README.md at master · nanoporetech/medaka · GitHub
関連