HTS (NGS) 関連のインフォマティクス情報についてまとめています。

ONTのロングリードアセンブリをポリッシュする PEPPER

2021 12/24 ツイート追記







pepper/ at r0.1 · kishwarshafin/pepper · GitHub



dockerの仮想環境でpipを使ってインストールした(ubuntu18.04LTS base image)。


python3 -m pip install pepper-polish

#docker (CPU based)
docker run -it --ipc=host --user=`id -u`:`id -g` --cpus="16" \
-v </directory/with/inputs_outputs>:/data kishwars/pepper:latest \
pepper --help

#docker (CPU based)
docker run --rm -it --ipc=host kishwars/pepper:latest pepper torch_stat

#docker (GPU based)
nvidia-docker run -it --ipc=host kishwars/pepper:latest pepper torch_stat
nvidia-docker run -it --ipc=host --user=`id -u`:`id -g` --cpus="16" \ -v </directory/with/inputs_outputs>:/data kishwars/pepper:latest \ pepper --help

> pepper --version

# pepper -h

usage: pepper [-h] [--version]




PEPPER is a RNN based polisher for polishing ONT-based assemblies. It works in three steps:

1) make_images: This module takes alignment file and coverts themto HDF5 files containing summary statistics.

2) call_consensus: This module takes the summary images and atrained neural network and generates predictions per base.

3) stitch: This module takes the inference files as input and stitches them to generate a polished assembly.


positional arguments:


    polish              Run the polishing pipeline. This will run make images-> inference -> stitch one after another.

                        The outputs of each step can be run separately using

                        the appropriate sub-command.

    make_images         Generate images that encode summary statistics of reads aligned to an assembly.

    call_consensus      Perform inference on generated images using a trained model.

    stitch              Stitch the polished genome to generate a contiguous polishedassembly.

    download_models     Download available models.

    torch_stat          See PyTorch configuration.

    version             Show program version.


optional arguments:

  -h, --help            show this help message and exit

  --version             Show version.

pepper polish -h

# pepper polish -h

usage: pepper polish [-h] -b BAM -f FASTA -m MODEL_PATH -o OUTPUT_FILE

                     [-t THREADS] [-r REGION] [-bs BATCH_SIZE] [-g]

                     [-d_ids DEVICE_IDS] [-w NUM_WORKERS]


optional arguments:

  -h, --help            show this help message and exit

  -b BAM, --bam BAM     BAM file containing mapping between reads and the

                        draft assembly.

  -f FASTA, --fasta FASTA

                        FASTA file containing the draft assembly.

  -m MODEL_PATH, --model_path MODEL_PATH

                        Path to a trained model.

  -o OUTPUT_FILE, --output_file OUTPUT_FILE

                        Path to output file with an expected prefix (i.e. -o


  -t THREADS, --threads THREADS

                        Number of threads to use. Default is 5.

  -r REGION, --region REGION

                        Region in [contig_name:start-end] format

  -bs BATCH_SIZE, --batch_size BATCH_SIZE

                        Batch size for testing, default is 100. Suggested

                        values: 256/512/1024.

  -g, --gpu             If set then PyTorch will use GPUs for inference. CUDA


  -d_ids DEVICE_IDS, --device_ids DEVICE_IDS

                        List of gpu device ids to use for inference. Only used

                        in distributed setting. Example usage: --device_ids

                        0,1,2 (this will create three callers in id 'cuda:0,

                        cuda:1 and cuda:2' If none then it will use all

                        available devices.

  -w NUM_WORKERS, --num_workers NUM_WORKERS

                        Number of workers for loading images. Default is 4.





pepper polish \
--bam draft_assembly.bam \
--fasta draft_assembly.fasta> \
--model_path <path/to/pepper/models/XXX.pkl> \
--output_file output_file_prefix \
--threads 20 \
--batch_size 128



pepper polish \
--bam draft_assembly.bam \
--fasta draft_assembly.fasta \
--model_path <path/to/pepper/models/XXX.pkl> \
--output_file output_file_prefix \
--threads 20 \
--batch_size 512 \
--gpu \
--num_workers <num_workers>


アセンブリからマッピング、ポリッシュの流れはPolishing Microbial genome assemblies with PEPPERを確認して下さい。 


PEPPER-DeepVariant によるバリアントコールの流れ


pepper/ at r0.1 · kishwarshafin/pepper · GitHub



GitHub - kishwarshafin/pepper: P.E.P.P.E.R. : Program for Evaluating Patterns in Pileups of Erroneous Reads