pbmm2はminimap2のC API用のSMRT C ++ラッパーである。 その目的は、ネイティブのPacBio入出力をサポートし、推奨パラメータセットでソート出力をon-the-fly(複数の処理をまとめて)で生成することである。 BAMがpbmm2への入力として使用されている場合は、ソートされた出力をGenomicConsensusを使用したpolishに直接使用できる。 ベンチマークは、pbmm2がBLASRよりも優れていることを示している。 pbmm2はBLASRの公式の代替となるツールである。
インストール
本体 Github
pacbioのオフィシャルレポジトリになる。
#anacondaを使っているならcondaで導入可能(mac,linux)
conda install -c bioconda -y pbmm2
> pbmm2
$ pbmm2
pbmm2 - minimap2 with native PacBio BAM support
Usage:
pbmm2 <tool>
Options:
-h, --help Output this help.
--version Output version info.
Tools:
index Index reference and store as .mmi file
align Align PacBio reads to reference sequences
Examples:
pbmm2 align ref.referenceset.xml movie.subreadset.xml ref.movie.alignmentset.xml
pbmm2 index ref.referenceset.xml ref.mmi
Typical workflows:
A. Generate index file for reference and reuse it to align reads
$ pbmm2 index ref.fasta ref.mmi
$ pbmm2 align ref.mmi movie.subreads.bam ref.movie.bam
B. Align reads and sort on-the-fly, with 4 alignment and 2 sort threads
$ pbmm2 align ref.fasta movie.subreads.bam ref.movie.bam --sort -j 4 -J 2
C. Align reads, sort on-the-fly, and create PBI
$ pbmm2 align ref.fasta movie.subreadset.xml ref.movie.alignmentset.xml --sort
D. Omit output file and stream BAM output to stdout
$ pbmm2 align hg38.mmi movie1.subreadset.xml | samtools sort > hg38.movie1.sorted.bam
E. Align CCS fastq input and sort on-the-fly
$ pbmm2 align ref.fasta movie.Q20.fastq ref.movie.bam --preset CCS --sort --rg '@RG\tID:myid\tSM:mysample'
> pbmm2 index -h
$ pbmm2 index -h
Usage: pbmm2 index [options] <ref.fa|xml> <out.mmi>
Index reference and store as .mmi file
Basic Options:
-h,--help Output this help.
--version Output version information.
--log-file Log to a file, instead of stdout.
--log-level Set log level: "TRACE", "DEBUG", "INFO", "WARN", "FATAL". ["WARN"]
-j,--num-threads Number of threads to use, 0 means autodetection. [0]
Parameter Set Option:
--preset Set alignment mode:
- "SUBREAD" -k 19 -w 10
- "CCS" -k 19 -w 10 -u
- "ISOSEQ" -k 15 -w 5 -u
- "UNROLLED" -k 15 -w 15
Default ["SUBREAD"]
Parameter Override Options:
-k k-mer size (no larger than 28). [-1]
-w Minizer window size. [-1]
-u,--no-kmer-compression Disable homopolymer-compressed k-mer (compression is activate for SUBREAD & UNROLLED presets).
Options:
--emit-tool-contract Emit tool contract.
--resolved-tool-contract Use args from resolved tool contract.
Arguments:
ref.fa|xml Reference FASTA, ReferenceSet XML
out.mmi Output Reference Index
> pbmm2 align -h
$ pbmm2 align -h
Usage: pbmm2 align [options] <ref.fa|xml|mmi> <in.bam|xml|fa|fq> [out.aligned.bam|xml]
Align PacBio reads to reference sequences
Basic Options:
-h,--help Output this help.
--version Output version information.
--log-file Log to a file, instead of stdout.
--log-level Set log level: "TRACE", "DEBUG", "INFO", "WARN", "FATAL". ["WARN"]
--chunk-size Process N records per chunk. [100]
Sorting Options:
--sort Generate sorted BAM file.
-m,--sort-memory Memory per thread for sorting. ["768M"]
Threading Options:
-j,--alignment-threads Number of threads used for alignment, 0 means autodetection. [0]
-J,--sort-threads Number of threads used for sorting; 0 means 25% of -j, maximum 8. [0]
Parameter Set Options:
--preset Set alignment mode:
- "SUBREAD" -k 19 -w 10 -o 5 -O 56 -e 4 -E 1 -A 2 -B 5 -z 400 -Z 50 -r 2000 -L 0.5
- "CCS" -k 19 -w 10 -u -o 5 -O 56 -e 4 -E 1 -A 2 -B 5 -z 400 -Z 50 -r 2000 -L 0.5
- "ISOSEQ" -k 15 -w 5 -u -o 2 -O 32 -e 1 -E 0 -A 1 -B 2 -z 200 -Z 100 -C 5 -r 200000 -G 200000 -L 0.5
- "UNROLLED" -k 15 -w 15 -o 2 -O 32 -e 1 -E 0 -A 1 -B 2 -z 200 -Z 100 -r 2000 -L 0.5
Default ["SUBREAD"]
General Parameter Override Options:
-k k-mer size (no larger than 28). [-1]
-w Minizer window size. [-1]
-u,--no-kmer-compression Disable homopolymer-compressed k-mer (compression is activate for SUBREAD & UNROLLED presets).
-A Matching score. [-1]
-B Mismatch penalty. [-1]
-z Z-drop score. [-1]
-Z Z-drop inversion score. [-1]
-r Bandwidth used in chaining and DP-based alignment. [-1]
Gap Parameter Override Options (a k-long gap costs min{o+k*e,O+k*E}):
-o,--gap-open-1 Gap open penalty 1. [-1]
-O,--gap-open-2 Gap open penalty 2. [-1]
-e,--gap-extend-1 Gap extension penalty 1. [-1]
-E,--gap-extend-2 Gap extension penalty 2. [-1]
-L,--lj-min-ratio Long join flank ratio. [-1]
IsoSeq Parameter Override Options:
-G Max intron length (changes -r). [-1]
-C Cost for a non-canonical GT-AG splicing. [-1]
--no-splice-flank Do not prefer splice flanks GT-AG.
Read Group Options:
--sample Sample name for all read groups. Defaults, in order of precedence: SM field in input read group, biosample name, well sample name, "UnnamedSample".
--rg Read group header line such as '@RG\tID:xyz\tSM:abc'. Only for FASTA/Q inputs.
Output Options:
-c,--min-concordance-perc Minimum alignment concordance in percent. [70]
-l,--min-length Minimum mapped read length in basepair. [50]
-N,--best-n Output at maximum N alignments for each read, 0 means no maximum. [0]
--strip Remove all kinetic and extra QV tags. Output cannot be polished.
--split-by-sample One output BAM per sample.
--no-bai Omit BAI generation for sorted output.
--unmapped Include unmapped records in output.
Input Manipulation Options (mutually exclusive):
--median-filter Pick one read per ZMW of median length.
--zmw Process ZMW Reads, subreadset.xml input required (activates UNROLLED preset).
--hqregion Process HQ region of each ZMW, subreadset.xml input required (activates UNROLLED preset).
Options:
--emit-tool-contract Emit tool contract.
--resolved-tool-contract Use args from resolved tool contract.
Arguments:
ref.fa|xml|mmi Reference FASTA, ReferenceSet XML, or Reference Index
in.bam|xml|fa|fq Input BAM, DataSet XML, FASTA, or FASTQ
out.aligned.bam|xml Output BAM or DataSet XML
実行方法
1、indexing
pbmm2 index ref.fasta ref.mmi
2、mapping
#Align CCS fastq input and sort bam output
pbmm2 align ref.fasta movie.Q20.fastq ref.movie.bam --preset CCS --sort --rg '@RG\tID:myid\tSM:mysample'
#Align reads and sort on-the-fly, with 4 alignment and 2 sort threads
pbmm2 align ref.fasta movie.bam ref.bam --sort -j 4 -J 2
- --sort Generate sorted BAM file
- --rg Read group header line such as '@RG\tID:xyz\tSM:abc'. Only for FASTA/Q inputs.
- -j Number of threads used for alignment, 0 means autodetection. [0]
- -J Number of threads used for sorting; 0 means 25% of -j, maximum 8. [0]
-
--preset Set alignment mode:
- "SUBREAD" -k 19 -w 10 -o 5 -O 56 -e 4 -E 1 -A 2 -B 5 -z 400 -Z 50 -r 2000 -L 0.5
- "CCS" -k 19 -w 10 -u -o 5 -O 56 -e 4 -E 1 -A 2 -B 5 -z 400 -Z 50 -r 2000 -L 0.5
- "ISOSEQ" -k 15 -w 5 -u -o 2 -O 32 -e 1 -E 0 -A 1 -B 2 -z 200 -Z 100 -C 5 -r 200000 -G 200000 -L 0.5
- "UNROLLED" -k 15 -w 15 -o 2 -O 32 -e 1 -E 0 -A 1 -B 2 -z 200 -Z 100 -r 2000 -L 0.5
Default ["SUBREAD"]
引用
GitHub - PacificBiosciences/pbmm2: A minimap2 frontend for PacBio native data formats
関連
関連
Structural Variant Detection in SMRT Link 5 with pbsv