2021 12/24 ツイート追記
P.E.P.P.P.E.R.は、オックスフォード・ナノポア・シークエンシング技術で動作するように設計されたディープ・ニューラル・ネットワーク・ベースのポリッシャーである。P.E.P.P.E.R.は、各ゲノム位置のサマリー統計からコンセンサス配列を呼び出すために、リカレントニューラルネットワーク(RNN)ベースのエンコーダ-デコーダモデルを使用している。SSWを用いた局所的な再アラインメント処理を用いており、他のツール(例えばracon)を用いた事前のポリッシュを必要としないモジュールとなっている。
Released PEPPER-Margin-DeepVariant r0.7
— Kishwar (@kishwarshafin) December 22, 2021
This release outperforms older versions and existing callers for @nanopore R9.4.1 Guppy 5 "Sup" and R10.4 Q20 data.
Pub: https://t.co/AS2buYWIMA
Free link: https://t.co/of6ZppO5C8https://t.co/zFPdxtTquI
🧵 on methods + results
[1/10] pic.twitter.com/Q4Aln7H33V
PEPPER v0.1 is now available for polishing @nanopore assembly. Don't only look at the base-quality after polishing, look at frameshifts and transcriptome completeness too.
— Kishwar (@kishwarshafin) 2020年10月9日
This framework is integral to PEPPER-DeepVariant work.https://t.co/oP1mKaG64O@BenedictPaten @mitenjain
PEPPER_variant_calling
pepper/PEPPER_variant_calling.md at r0.1 · kishwarshafin/pepper · GitHub
インストール
dockerの仮想環境でpipを使ってインストールした(ubuntu18.04LTS base image)。
#pip
python3 -m pip install pepper-polish
#docker (CPU based)
docker run -it --ipc=host --user=`id -u`:`id -g` --cpus="16" \
-v </directory/with/inputs_outputs>:/data kishwars/pepper:latest \
pepper --help
#docker (CPU based)
docker run --rm -it --ipc=host kishwars/pepper:latest pepper torch_stat
#docker (GPU based)
# CHECK GPU STATE:
nvidia-docker run -it --ipc=host kishwars/pepper:latest pepper torch_stat
# RUN PEPPER
nvidia-docker run -it --ipc=host --user=`id -u`:`id -g` --cpus="16" \ -v </directory/with/inputs_outputs>:/data kishwars/pepper:latest \ pepper --help
> pepper --version
# pepper -h
usage: pepper [-h] [--version]
{polish,make_images,call_consensus,stitch,download_models,torch_stat,version}
...
PEPPER is a RNN based polisher for polishing ONT-based assemblies. It works in three steps:
1) make_images: This module takes alignment file and coverts themto HDF5 files containing summary statistics.
2) call_consensus: This module takes the summary images and atrained neural network and generates predictions per base.
3) stitch: This module takes the inference files as input and stitches them to generate a polished assembly.
positional arguments:
{polish,make_images,call_consensus,stitch,download_models,torch_stat,version}
polish Run the polishing pipeline. This will run make images-> inference -> stitch one after another.
The outputs of each step can be run separately using
the appropriate sub-command.
make_images Generate images that encode summary statistics of reads aligned to an assembly.
call_consensus Perform inference on generated images using a trained model.
stitch Stitch the polished genome to generate a contiguous polishedassembly.
download_models Download available models.
torch_stat See PyTorch configuration.
version Show program version.
optional arguments:
-h, --help show this help message and exit
--version Show version.
> pepper polish -h
# pepper polish -h
usage: pepper polish [-h] -b BAM -f FASTA -m MODEL_PATH -o OUTPUT_FILE
[-t THREADS] [-r REGION] [-bs BATCH_SIZE] [-g]
[-d_ids DEVICE_IDS] [-w NUM_WORKERS]
optional arguments:
-h, --help show this help message and exit
-b BAM, --bam BAM BAM file containing mapping between reads and the
draft assembly.
FASTA file containing the draft assembly.
-m MODEL_PATH, --model_path MODEL_PATH
Path to a trained model.
-o OUTPUT_FILE, --output_file OUTPUT_FILE
Path to output file with an expected prefix (i.e. -o
./outputs/polished_genome)
-t THREADS, --threads THREADS
Number of threads to use. Default is 5.
-r REGION, --region REGION
Region in [contig_name:start-end] format
-bs BATCH_SIZE, --batch_size BATCH_SIZE
Batch size for testing, default is 100. Suggested
values: 256/512/1024.
-g, --gpu If set then PyTorch will use GPUs for inference. CUDA
required.
-d_ids DEVICE_IDS, --device_ids DEVICE_IDS
List of gpu device ids to use for inference. Only used
in distributed setting. Example usage: --device_ids
0,1,2 (this will create three callers in id 'cuda:0,
cuda:1 and cuda:2' If none then it will use all
available devices.
-w NUM_WORKERS, --num_workers NUM_WORKERS
Number of workers for loading images. Default is 4.
実行方法
CPUバージョン
pepper polish \
--bam draft_assembly.bam \
--fasta draft_assembly.fasta> \
--model_path <path/to/pepper/models/XXX.pkl> \
--output_file output_file_prefix \
--threads 20 \
--batch_size 128
GPUバージョン
pepper polish \
--bam draft_assembly.bam \
--fasta draft_assembly.fasta \
--model_path <path/to/pepper/models/XXX.pkl> \
--output_file output_file_prefix \
--threads 20 \
--batch_size 512 \
--gpu \
--num_workers <num_workers>
アセンブリからマッピング、ポリッシュの流れはPolishing Microbial genome assemblies with PEPPERを確認して下さい。
PEPPER-DeepVariant によるバリアントコールの流れ
(DeepVariantグループと共同で、ONT用のハプロタイプawareなバリアントコーリングパイプラインを開発している)
pepper/PEPPER_variant_calling.md at r0.1 · kishwarshafin/pepper · GitHub
引用
参考
https://nanoporetech.com/sites/default/files/s3/literature/snv-calling-and-phasing-workflow.pdf
関連