2019-03-23

Pacbioのpolishingツール Quiver / ArrowとバリアントコーラーPlurality

　Quiverは、Pacbioがテンプレートリードを前提として、最大準尤度テンプレートシーケンスを見つける、より洗練されたアルゴリズムである。 PacBioのリードは、テンプレートシーケンスを指定してリードの準尤度をスコア付けする条件付きランダムフィールドアプローチを使用してモデル化される。各リードの基本シーケンスに加えて、Quiverはbasecaller元が提供するいくつかの追加のQV共変量を使用する。これらの共変量を使用すると、各リードに関する追加情報が提供され、より正確なコンセンサスコールが可能になる。Quiverは、マッパーによって提供されるアライメント（通常はBLASR）を使用しない。ただし、マクロなレベルでリードをまとめどのようにグループ化するか決定する場合を除く。暗黙的に独自のリアライメントを実行するので、indelを含むすべてのバリアント型に非常に敏感である。

　Arrowは近い将来Quiverに取って代わることを意図した新しいモデルである。 Quiverとの主な違いは、CRFの代わりにHMMモデルを使用し、本当の可能性を計算し、さらに少数の共変量を使用することである。 Arrowに関するホワイトペーパーがもうすぐ利用可能になる予定である。

　Pluralityはシンプルなバリアンコールアルゴリズムである。それはアライメントされたリード（BLASRによって生成、または代替マッピングツールで生成されたもの）をpileupし、各ポジションで、リファレンスベースで最も豊富な（つまり複数の）塩基をコンセンサスとしてコールする。

矢筒（Quiver）と矢（Arrow）のセットです。

How to install and use GenomicConsensus

GenomicConsensus/HowTo.rst at develop · PacificBiosciences/GenomicConsensus · GitHub

Quiver FAQ

https://github.com/PacificBiosciences/GenomicConsensus/blob/develop/doc/FAQ.rst

Quiver is described in detail in the supplementary material to the HGAP paper（ref.1）.

関連ツイート

インストール

ubuntu16.04のPython 3.6.8環境でテストした（ホストOS macos10.14）。

Github

GenomicConsensus

PacBio Secondary Analysis Tools on Bioconda（Bicondaに対応したPacbioのオフィシャルツール）

#condaで導入
conda install -c bioconda genomicconsensus

> quiver -h

$ quiver -h

usage: variantCaller [-h] [--version] [--emit-tool-contract]

[--resolved-tool-contract RESOLVED_TOOL_CONTRACT]

[--log-file LOG_FILE]

[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | -v]

--referenceFilename REFERENCEFILENAME -o OUTPUTFILENAMES

[-j NUMWORKERS] [--minConfidence MINCONFIDENCE]

[--minCoverage MINCOVERAGE]

[--noEvidenceConsensusCall {nocall,reference,lowercasereference}]

[--coverage COVERAGE] [--minMapQV MINMAPQV]

[--referenceWindow REFERENCEWINDOWSASSTRING]

[--alignmentSetRefWindows]

[--referenceWindowsFile REFERENCEWINDOWSASSTRING]

[--barcode _BARCODE] [--readStratum READSTRATUM]

[--minReadScore MINREADSCORE] [--minSnr MINHQREGIONSNR]

[--minZScore MINZSCORE] [--minAccuracy MINACCURACY]

[--algorithm {quiver,arrow,plurality,poa,best}]

[--parametersFile PARAMETERSFILE]

[--parametersSpec PARAMETERSSPEC]

[--maskRadius MASKRADIUS] [--maskErrorRate MASKERRORRATE]

[--pdb] [--notrace] [--pdbAtStartup] [--profile]

[--annotateGFF] [--reportEffectiveCoverage] [--diploid]

[--queueSize QUEUESIZE] [--threaded]

[--referenceChunkSize REFERENCECHUNKSIZE]

[--fancyChunking] [--simpleChunking]

[--referenceChunkOverlap REFERENCECHUNKOVERLAP]

[--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE]

[--aligner {affine,simple}] [--refineDinucleotideRepeats]

[--noRefineDinucleotideRepeats] [--fast]

[--skipUnrecognizedContigs]

inputFilename

Compute genomic consensus and call variants relative to the reference.

optional arguments:

-h, --help show this help message and exit

--version show program's version number and exit

--emit-tool-contract Emit Tool Contract to stdout (default: False)

--resolved-tool-contract RESOLVED_TOOL_CONTRACT

Run Tool directly from a PacBio Resolved tool contract

(default: None)

--log-file LOG_FILE Write the log to file. Default(None) will write to

stdout. (default: None)

--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

Set log level (default: WARN)

--debug Alias for setting log level to DEBUG (default: False)

--quiet Alias for setting log level to CRITICAL to suppress

output. (default: False)

-v, --verbose Set the verbosity level. (default: None)

Basic required options:

inputFilename The input cmp.h5 or BAM alignment file

--referenceFilename REFERENCEFILENAME, --reference REFERENCEFILENAME, -r REFERENCEFILENAME

The filename of the reference FASTA file (default:

None)

-o OUTPUTFILENAMES, --outputFilename OUTPUTFILENAMES

The output filename(s), as a comma-separated

list.Valid output formats are .fa/.fasta, .fq/.fastq,

.gff, .vcf (default: )

Parallelism:

-j NUMWORKERS, --numWorkers NUMWORKERS

The number of worker processes to be used (default: 1)

Output filtering:

--minConfidence MINCONFIDENCE, -q MINCONFIDENCE

The minimum confidence for a variant call to be output

to variants.{gff,vcf} (default: 40)

--minCoverage MINCOVERAGE, -x MINCOVERAGE

The minimum site coverage that must be achieved for

variant calls and consensus to be calculated for a

site. (default: 5)

--noEvidenceConsensusCall {nocall,reference,lowercasereference}

The consensus base that will be output for sites with

no effective coverage. (default: lowercasereference)

Read selection/filtering:

--coverage COVERAGE, -X COVERAGE

A designation of the maximum coverage level to be used

for analysis. Exact interpretation is algorithm-

specific. (default: 100)

--minMapQV MINMAPQV, -m MINMAPQV

The minimum MapQV for reads that will be used for

analysis. (default: 10)

--referenceWindow REFERENCEWINDOWSASSTRING, --referenceWindows REFERENCEWINDOWSASSTRING, -w REFERENCEWINDOWSASSTRING

The window (or multiple comma-delimited windows) of

the reference to be processed, in the format refGroup

:refStart-refEnd (default: entire reference).

(default: None)

--alignmentSetRefWindows

The window (or multiple comma-delimited windows) of

the reference to be processed, in the format refGroup

:refStart-refEnd will be pulled from the alignment

file. (default: False)

--referenceWindowsFile REFERENCEWINDOWSASSTRING, -W REFERENCEWINDOWSASSTRING

A file containing reference window designations, one

per line (default: None)

--barcode _BARCODE Only process reads with the given barcode name.

(default: None)

--readStratum READSTRATUM

A string of the form 'n/N', where n, and N are

integers, 0 <= n < N, designating that the reads are

to be deterministically split into N strata of roughly

even size, and stratum n is to be used for variant and

consensus calling. This is mostly useful for Quiver

development. (default: None)

--minReadScore MINREADSCORE

The minimum ReadScore for reads that will be used for

analysis (arrow-only). (default: 0.65)

--minSnr MINHQREGIONSNR

The minimum acceptable signal-to-noise over all

channels for reads that will be used for analysis

(arrow-only). (default: 2.5)

--minZScore MINZSCORE

The minimum acceptable z-score for reads that will be

used for analysis (arrow-only). (default: -3.5)

--minAccuracy MINACCURACY

The minimum acceptable window-global alignment

accuracy for reads that will be used for the analysis

(arrow-only). (default: 0.82)

Algorithm and parameter settings:

--algorithm {quiver,arrow,plurality,poa,best}

--parametersFile PARAMETERSFILE, -P PARAMETERSFILE

Parameter set filename (such as ArrowParameters.json

or QuiverParameters.ini), or directory D such that

either D/*/GenomicConsensus/QuiverParameters.ini, or

D/GenomicConsensus/QuiverParameters.ini, is found. In

the former case, the lexically largest path is chosen.

(default: None)

--parametersSpec PARAMETERSSPEC, -p PARAMETERSSPEC

Name of parameter set (chemistry.model) to select from

the parameters file, or just the name of the

chemistry, in which case the best available model is

chosen. Default is 'auto', which selects the best

parameter set from the alignment data (default: auto)

--maskRadius MASKRADIUS

Radius of window to use when excluding local regions

for exceeding maskMinErrorRate, where 0 disables any

filtering (arrow-only). (default: 3)

--maskErrorRate MASKERRORRATE

Maximum local error rate before the local region

defined by maskRadius is excluded from polishing

(arrow-only). (default: 0.7)

Verbosity and debugging/profiling:

--pdb Enable Python debugger (default: False)

--notrace Suppress stacktrace for exceptions (to simplify

testing) (default: False)

--pdbAtStartup Drop into Python debugger at startup (requires ipdb)

(default: False)

--profile Enable Python-level profiling (using cProfile).

(default: False)

--annotateGFF Augment GFF variant records with additional

information (default: False)

--reportEffectiveCoverage

Additionally record the *post-filtering* coverage at

variant sites (default: False)

Advanced configuration options:

--diploid Enable detection of heterozygous variants

(experimental) (default: False)

--queueSize QUEUESIZE, -Q QUEUESIZE

--threaded, -T Run threads instead of processes (for debugging

purposes only) (default: False)

--referenceChunkSize REFERENCECHUNKSIZE, -C REFERENCECHUNKSIZE

--fancyChunking Adaptive reference chunking designed to handle

coverage cutouts better (default: True)

--simpleChunking Disable adaptive reference chunking (default: True)

--referenceChunkOverlap REFERENCECHUNKOVERLAP

--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE

Disable the HDF5 chunk cache when the number of

datasets in the cmp.h5 exceeds the given threshold

(default: 500)

--aligner {affine,simple}, -a {affine,simple}

The pairwise alignment algorithm that will be used to

produce variant calls from the consensus (Quiver

only). (default: affine)

--refineDinucleotideRepeats

Require quiver maximum likelihood search to try one

less/more repeat copy in dinucleotide repeats, which

seem to be the most frequent cause of suboptimal

convergence (getting trapped in local optimum) (Quiver

only) (default: True)

--noRefineDinucleotideRepeats

Disable dinucleotide refinement (default: True)

--fast Cut some corners to run faster. Unsupported! (default:

False)

--skipUnrecognizedContigs

Do not abort when told to process a reference window

(via -w/--referenceWindow[s]) that has no aligned

coverage. Outputs emptyish files if there are no

remaining non-degenerate windows. Only intended for

use by smrtpipe scatter/gather. (default: False)

> arrow -h

$ arrow -h

usage: variantCaller [-h] [--version] [--emit-tool-contract]

[--resolved-tool-contract RESOLVED_TOOL_CONTRACT]

[--log-file LOG_FILE]

[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | -v]

--referenceFilename REFERENCEFILENAME -o OUTPUTFILENAMES

[-j NUMWORKERS] [--minConfidence MINCONFIDENCE]

[--minCoverage MINCOVERAGE]

[--noEvidenceConsensusCall {nocall,reference,lowercasereference}]

[--coverage COVERAGE] [--minMapQV MINMAPQV]

[--referenceWindow REFERENCEWINDOWSASSTRING]

[--alignmentSetRefWindows]

[--referenceWindowsFile REFERENCEWINDOWSASSTRING]

[--barcode _BARCODE] [--readStratum READSTRATUM]

[--minReadScore MINREADSCORE] [--minSnr MINHQREGIONSNR]

[--minZScore MINZSCORE] [--minAccuracy MINACCURACY]

[--algorithm {quiver,arrow,plurality,poa,best}]

[--parametersFile PARAMETERSFILE]

[--parametersSpec PARAMETERSSPEC]

[--maskRadius MASKRADIUS] [--maskErrorRate MASKERRORRATE]

[--pdb] [--notrace] [--pdbAtStartup] [--profile]

[--annotateGFF] [--reportEffectiveCoverage] [--diploid]

[--queueSize QUEUESIZE] [--threaded]

[--referenceChunkSize REFERENCECHUNKSIZE]

[--fancyChunking] [--simpleChunking]

[--referenceChunkOverlap REFERENCECHUNKOVERLAP]

[--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE]

[--aligner {affine,simple}] [--refineDinucleotideRepeats]

[--noRefineDinucleotideRepeats] [--fast]

[--skipUnrecognizedContigs]

inputFilename

Compute genomic consensus and call variants relative to the reference.

optional arguments:

-h, --help show this help message and exit

--version show program's version number and exit

--emit-tool-contract Emit Tool Contract to stdout (default: False)

--resolved-tool-contract RESOLVED_TOOL_CONTRACT

Run Tool directly from a PacBio Resolved tool contract

(default: None)

--log-file LOG_FILE Write the log to file. Default(None) will write to

stdout. (default: None)

--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

Set log level (default: WARN)

--debug Alias for setting log level to DEBUG (default: False)

--quiet Alias for setting log level to CRITICAL to suppress

output. (default: False)

-v, --verbose Set the verbosity level. (default: None)

Basic required options:

inputFilename The input cmp.h5 or BAM alignment file

--referenceFilename REFERENCEFILENAME, --reference REFERENCEFILENAME, -r REFERENCEFILENAME

The filename of the reference FASTA file (default:

None)

-o OUTPUTFILENAMES, --outputFilename OUTPUTFILENAMES

The output filename(s), as a comma-separated

list.Valid output formats are .fa/.fasta, .fq/.fastq,

.gff, .vcf (default: )

Parallelism:

-j NUMWORKERS, --numWorkers NUMWORKERS

The number of worker processes to be used (default: 1)

Output filtering:

--minConfidence MINCONFIDENCE, -q MINCONFIDENCE

The minimum confidence for a variant call to be output

to variants.{gff,vcf} (default: 40)

--minCoverage MINCOVERAGE, -x MINCOVERAGE

The minimum site coverage that must be achieved for

variant calls and consensus to be calculated for a

site. (default: 5)

--noEvidenceConsensusCall {nocall,reference,lowercasereference}

The consensus base that will be output for sites with

no effective coverage. (default: lowercasereference)

Read selection/filtering:

--coverage COVERAGE, -X COVERAGE

A designation of the maximum coverage level to be used

for analysis. Exact interpretation is algorithm-

specific. (default: 100)

--minMapQV MINMAPQV, -m MINMAPQV

The minimum MapQV for reads that will be used for

analysis. (default: 10)

--referenceWindow REFERENCEWINDOWSASSTRING, --referenceWindows REFERENCEWINDOWSASSTRING, -w REFERENCEWINDOWSASSTRING

The window (or multiple comma-delimited windows) of

the reference to be processed, in the format refGroup

:refStart-refEnd (default: entire reference).

(default: None)

--alignmentSetRefWindows

The window (or multiple comma-delimited windows) of

the reference to be processed, in the format refGroup

:refStart-refEnd will be pulled from the alignment

file. (default: False)

--referenceWindowsFile REFERENCEWINDOWSASSTRING, -W REFERENCEWINDOWSASSTRING

A file containing reference window designations, one

per line (default: None)

--barcode _BARCODE Only process reads with the given barcode name.

(default: None)

--readStratum READSTRATUM

A string of the form 'n/N', where n, and N are

integers, 0 <= n < N, designating that the reads are

to be deterministically split into N strata of roughly

even size, and stratum n is to be used for variant and

consensus calling. This is mostly useful for Quiver

development. (default: None)

--minReadScore MINREADSCORE

The minimum ReadScore for reads that will be used for

analysis (arrow-only). (default: 0.65)

--minSnr MINHQREGIONSNR

The minimum acceptable signal-to-noise over all

channels for reads that will be used for analysis

(arrow-only). (default: 2.5)

--minZScore MINZSCORE

The minimum acceptable z-score for reads that will be

used for analysis (arrow-only). (default: -3.5)

--minAccuracy MINACCURACY

The minimum acceptable window-global alignment

accuracy for reads that will be used for the analysis

(arrow-only). (default: 0.82)

Algorithm and parameter settings:

--algorithm {quiver,arrow,plurality,poa,best}

--parametersFile PARAMETERSFILE, -P PARAMETERSFILE

Parameter set filename (such as ArrowParameters.json

or QuiverParameters.ini), or directory D such that

either D/*/GenomicConsensus/QuiverParameters.ini, or

D/GenomicConsensus/QuiverParameters.ini, is found. In

the former case, the lexically largest path is chosen.

(default: None)

--parametersSpec PARAMETERSSPEC, -p PARAMETERSSPEC

Name of parameter set (chemistry.model) to select from

the parameters file, or just the name of the

chemistry, in which case the best available model is

chosen. Default is 'auto', which selects the best

parameter set from the alignment data (default: auto)

--maskRadius MASKRADIUS

Radius of window to use when excluding local regions

for exceeding maskMinErrorRate, where 0 disables any

filtering (arrow-only). (default: 3)

--maskErrorRate MASKERRORRATE

Maximum local error rate before the local region

defined by maskRadius is excluded from polishing

(arrow-only). (default: 0.7)

Verbosity and debugging/profiling:

--pdb Enable Python debugger (default: False)

--notrace Suppress stacktrace for exceptions (to simplify

testing) (default: False)

--pdbAtStartup Drop into Python debugger at startup (requires ipdb)

(default: False)

--profile Enable Python-level profiling (using cProfile).

(default: False)

--annotateGFF Augment GFF variant records with additional

information (default: False)

--reportEffectiveCoverage

Additionally record the *post-filtering* coverage at

variant sites (default: False)

Advanced configuration options:

--diploid Enable detection of heterozygous variants

(experimental) (default: False)

--queueSize QUEUESIZE, -Q QUEUESIZE

--threaded, -T Run threads instead of processes (for debugging

purposes only) (default: False)

--referenceChunkSize REFERENCECHUNKSIZE, -C REFERENCECHUNKSIZE

--fancyChunking Adaptive reference chunking designed to handle

coverage cutouts better (default: True)

--simpleChunking Disable adaptive reference chunking (default: True)

--referenceChunkOverlap REFERENCECHUNKOVERLAP

--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE

Disable the HDF5 chunk cache when the number of

datasets in the cmp.h5 exceeds the given threshold

(default: 500)

--aligner {affine,simple}, -a {affine,simple}

The pairwise alignment algorithm that will be used to

produce variant calls from the consensus (Quiver

only). (default: affine)

--refineDinucleotideRepeats

Require quiver maximum likelihood search to try one

less/more repeat copy in dinucleotide repeats, which

seem to be the most frequent cause of suboptimal

convergence (getting trapped in local optimum) (Quiver

only) (default: True)

--noRefineDinucleotideRepeats

Disable dinucleotide refinement (default: True)

--fast Cut some corners to run faster. Unsupported! (default:

False)

--skipUnrecognizedContigs

Do not abort when told to process a reference window

(via -w/--referenceWindow[s]) that has no aligned

coverage. Outputs emptyish files if there are no

remaining non-degenerate windows. Only intended for

use by smrtpipe scatter/gather. (default: False)

> plurality -h

$ plurality -h

usage: variantCaller [-h] [--version] [--emit-tool-contract]

[--resolved-tool-contract RESOLVED_TOOL_CONTRACT]

[--log-file LOG_FILE]

[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | -v]

--referenceFilename REFERENCEFILENAME -o OUTPUTFILENAMES

[-j NUMWORKERS] [--minConfidence MINCONFIDENCE]

[--minCoverage MINCOVERAGE]

[--noEvidenceConsensusCall {nocall,reference,lowercasereference}]

[--coverage COVERAGE] [--minMapQV MINMAPQV]

[--referenceWindow REFERENCEWINDOWSASSTRING]

[--alignmentSetRefWindows]

[--referenceWindowsFile REFERENCEWINDOWSASSTRING]

[--barcode _BARCODE] [--readStratum READSTRATUM]

[--minReadScore MINREADSCORE] [--minSnr MINHQREGIONSNR]

[--minZScore MINZSCORE] [--minAccuracy MINACCURACY]

[--algorithm {quiver,arrow,plurality,poa,best}]

[--parametersFile PARAMETERSFILE]

[--parametersSpec PARAMETERSSPEC]

[--maskRadius MASKRADIUS] [--maskErrorRate MASKERRORRATE]

[--pdb] [--notrace] [--pdbAtStartup] [--profile]

[--annotateGFF] [--reportEffectiveCoverage] [--diploid]

[--queueSize QUEUESIZE] [--threaded]

[--referenceChunkSize REFERENCECHUNKSIZE]

[--fancyChunking] [--simpleChunking]

[--referenceChunkOverlap REFERENCECHUNKOVERLAP]

[--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE]

[--aligner {affine,simple}] [--refineDinucleotideRepeats]

[--noRefineDinucleotideRepeats] [--fast]

[--skipUnrecognizedContigs]

inputFilename

Compute genomic consensus and call variants relative to the reference.

optional arguments:

-h, --help show this help message and exit

--version show program's version number and exit

--emit-tool-contract Emit Tool Contract to stdout (default: False)

--resolved-tool-contract RESOLVED_TOOL_CONTRACT

Run Tool directly from a PacBio Resolved tool contract

(default: None)

--log-file LOG_FILE Write the log to file. Default(None) will write to

stdout. (default: None)

--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

Set log level (default: WARN)

--debug Alias for setting log level to DEBUG (default: False)

--quiet Alias for setting log level to CRITICAL to suppress

output. (default: False)

-v, --verbose Set the verbosity level. (default: None)

Basic required options:

inputFilename The input cmp.h5 or BAM alignment file

--referenceFilename REFERENCEFILENAME, --reference REFERENCEFILENAME, -r REFERENCEFILENAME

The filename of the reference FASTA file (default:

None)

-o OUTPUTFILENAMES, --outputFilename OUTPUTFILENAMES

The output filename(s), as a comma-separated

list.Valid output formats are .fa/.fasta, .fq/.fastq,

.gff, .vcf (default: )

Parallelism:

-j NUMWORKERS, --numWorkers NUMWORKERS

The number of worker processes to be used (default: 1)

Output filtering:

--minConfidence MINCONFIDENCE, -q MINCONFIDENCE

The minimum confidence for a variant call to be output

to variants.{gff,vcf} (default: 40)

--minCoverage MINCOVERAGE, -x MINCOVERAGE

The minimum site coverage that must be achieved for

variant calls and consensus to be calculated for a

site. (default: 5)

--noEvidenceConsensusCall {nocall,reference,lowercasereference}

The consensus base that will be output for sites with

no effective coverage. (default: lowercasereference)

Read selection/filtering:

--coverage COVERAGE, -X COVERAGE

A designation of the maximum coverage level to be used

for analysis. Exact interpretation is algorithm-

specific. (default: 100)

--minMapQV MINMAPQV, -m MINMAPQV

The minimum MapQV for reads that will be used for

analysis. (default: 10)

--referenceWindow REFERENCEWINDOWSASSTRING, --referenceWindows REFERENCEWINDOWSASSTRING, -w REFERENCEWINDOWSASSTRING

The window (or multiple comma-delimited windows) of

the reference to be processed, in the format refGroup

:refStart-refEnd (default: entire reference).

(default: None)

--alignmentSetRefWindows

The window (or multiple comma-delimited windows) of

the reference to be processed, in the format refGroup

:refStart-refEnd will be pulled from the alignment

file. (default: False)

--referenceWindowsFile REFERENCEWINDOWSASSTRING, -W REFERENCEWINDOWSASSTRING

A file containing reference window designations, one

per line (default: None)

--barcode _BARCODE Only process reads with the given barcode name.

(default: None)

--readStratum READSTRATUM

A string of the form 'n/N', where n, and N are

integers, 0 <= n < N, designating that the reads are

to be deterministically split into N strata of roughly

even size, and stratum n is to be used for variant and

consensus calling. This is mostly useful for Quiver

development. (default: None)

--minReadScore MINREADSCORE

The minimum ReadScore for reads that will be used for

analysis (arrow-only). (default: 0.65)

--minSnr MINHQREGIONSNR

The minimum acceptable signal-to-noise over all

channels for reads that will be used for analysis

(arrow-only). (default: 2.5)

--minZScore MINZSCORE

The minimum acceptable z-score for reads that will be

used for analysis (arrow-only). (default: -3.5)

--minAccuracy MINACCURACY

The minimum acceptable window-global alignment

accuracy for reads that will be used for the analysis

(arrow-only). (default: 0.82)

Algorithm and parameter settings:

--algorithm {quiver,arrow,plurality,poa,best}

--parametersFile PARAMETERSFILE, -P PARAMETERSFILE

Parameter set filename (such as ArrowParameters.json

or QuiverParameters.ini), or directory D such that

either D/*/GenomicConsensus/QuiverParameters.ini, or

D/GenomicConsensus/QuiverParameters.ini, is found. In

the former case, the lexically largest path is chosen.

(default: None)

--parametersSpec PARAMETERSSPEC, -p PARAMETERSSPEC

Name of parameter set (chemistry.model) to select from

the parameters file, or just the name of the

chemistry, in which case the best available model is

chosen. Default is 'auto', which selects the best

parameter set from the alignment data (default: auto)

--maskRadius MASKRADIUS

Radius of window to use when excluding local regions

for exceeding maskMinErrorRate, where 0 disables any

filtering (arrow-only). (default: 3)

--maskErrorRate MASKERRORRATE

Maximum local error rate before the local region

defined by maskRadius is excluded from polishing

(arrow-only). (default: 0.7)

Verbosity and debugging/profiling:

--pdb Enable Python debugger (default: False)

--notrace Suppress stacktrace for exceptions (to simplify

testing) (default: False)

--pdbAtStartup Drop into Python debugger at startup (requires ipdb)

(default: False)

--profile Enable Python-level profiling (using cProfile).

(default: False)

--annotateGFF Augment GFF variant records with additional

information (default: False)

--reportEffectiveCoverage

Additionally record the *post-filtering* coverage at

variant sites (default: False)

Advanced configuration options:

--diploid Enable detection of heterozygous variants

(experimental) (default: False)

--queueSize QUEUESIZE, -Q QUEUESIZE

--threaded, -T Run threads instead of processes (for debugging

purposes only) (default: False)

--referenceChunkSize REFERENCECHUNKSIZE, -C REFERENCECHUNKSIZE

--fancyChunking Adaptive reference chunking designed to handle

coverage cutouts better (default: True)

--simpleChunking Disable adaptive reference chunking (default: True)

--referenceChunkOverlap REFERENCECHUNKOVERLAP

--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE

Disable the HDF5 chunk cache when the number of

datasets in the cmp.h5 exceeds the given threshold

(default: 500)

--aligner {affine,simple}, -a {affine,simple}

The pairwise alignment algorithm that will be used to

produce variant calls from the consensus (Quiver

only). (default: affine)

--refineDinucleotideRepeats

Require quiver maximum likelihood search to try one

less/more repeat copy in dinucleotide repeats, which

seem to be the most frequent cause of suboptimal

convergence (getting trapped in local optimum) (Quiver

only) (default: True)

--noRefineDinucleotideRepeats

Disable dinucleotide refinement (default: True)

--fast Cut some corners to run faster. Unsupported! (default:

False)

--skipUnrecognizedContigs

Do not abort when told to process a reference window

(via -w/--referenceWindow[s]) that has no aligned

coverage. Outputs emptyish files if there are no

remaining non-degenerate windows. Only intended for

use by smrtpipe scatter/gather. (default: False)

> variantCaller -h

$ variantCaller -h

usage: variantCaller [-h] [--version] [--emit-tool-contract]

[--resolved-tool-contract RESOLVED_TOOL_CONTRACT]

[--log-file LOG_FILE]

[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | -v]

--referenceFilename REFERENCEFILENAME -o OUTPUTFILENAMES

[-j NUMWORKERS] [--minConfidence MINCONFIDENCE]

[--minCoverage MINCOVERAGE]

[--noEvidenceConsensusCall {nocall,reference,lowercasereference}]

[--coverage COVERAGE] [--minMapQV MINMAPQV]

[--referenceWindow REFERENCEWINDOWSASSTRING]

[--alignmentSetRefWindows]

[--referenceWindowsFile REFERENCEWINDOWSASSTRING]

[--barcode _BARCODE] [--readStratum READSTRATUM]

[--minReadScore MINREADSCORE] [--minSnr MINHQREGIONSNR]

[--minZScore MINZSCORE] [--minAccuracy MINACCURACY]

[--algorithm {quiver,arrow,plurality,poa,best}]

[--parametersFile PARAMETERSFILE]

[--parametersSpec PARAMETERSSPEC]

[--maskRadius MASKRADIUS] [--maskErrorRate MASKERRORRATE]

[--pdb] [--notrace] [--pdbAtStartup] [--profile]

[--annotateGFF] [--reportEffectiveCoverage] [--diploid]

[--queueSize QUEUESIZE] [--threaded]

[--referenceChunkSize REFERENCECHUNKSIZE]

[--fancyChunking] [--simpleChunking]

[--referenceChunkOverlap REFERENCECHUNKOVERLAP]

[--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE]

[--aligner {affine,simple}] [--refineDinucleotideRepeats]

[--noRefineDinucleotideRepeats] [--fast]

[--skipUnrecognizedContigs]

inputFilename

Compute genomic consensus and call variants relative to the reference.

optional arguments:

-h, --help show this help message and exit

--version show program's version number and exit

--emit-tool-contract Emit Tool Contract to stdout (default: False)

--resolved-tool-contract RESOLVED_TOOL_CONTRACT

Run Tool directly from a PacBio Resolved tool contract

(default: None)

--log-file LOG_FILE Write the log to file. Default(None) will write to

stdout. (default: None)

--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

Set log level (default: WARN)

--debug Alias for setting log level to DEBUG (default: False)

--quiet Alias for setting log level to CRITICAL to suppress

output. (default: False)

-v, --verbose Set the verbosity level. (default: None)

Basic required options:

inputFilename The input cmp.h5 or BAM alignment file

--referenceFilename REFERENCEFILENAME, --reference REFERENCEFILENAME, -r REFERENCEFILENAME

The filename of the reference FASTA file (default:

None)

-o OUTPUTFILENAMES, --outputFilename OUTPUTFILENAMES

The output filename(s), as a comma-separated

list.Valid output formats are .fa/.fasta, .fq/.fastq,

.gff, .vcf (default: )

Parallelism:

-j NUMWORKERS, --numWorkers NUMWORKERS

The number of worker processes to be used (default: 1)

Output filtering:

--minConfidence MINCONFIDENCE, -q MINCONFIDENCE

The minimum confidence for a variant call to be output

to variants.{gff,vcf} (default: 40)

--minCoverage MINCOVERAGE, -x MINCOVERAGE

The minimum site coverage that must be achieved for

variant calls and consensus to be calculated for a

site. (default: 5)

--noEvidenceConsensusCall {nocall,reference,lowercasereference}

The consensus base that will be output for sites with

no effective coverage. (default: lowercasereference)

Read selection/filtering:

--coverage COVERAGE, -X COVERAGE

A designation of the maximum coverage level to be used

for analysis. Exact interpretation is algorithm-

specific. (default: 100)

--minMapQV MINMAPQV, -m MINMAPQV

The minimum MapQV for reads that will be used for

analysis. (default: 10)

--referenceWindow REFERENCEWINDOWSASSTRING, --referenceWindows REFERENCEWINDOWSASSTRING, -w REFERENCEWINDOWSASSTRING

The window (or multiple comma-delimited windows) of

the reference to be processed, in the format refGroup

:refStart-refEnd (default: entire reference).

(default: None)

--alignmentSetRefWindows

The window (or multiple comma-delimited windows) of

the reference to be processed, in the format refGroup

:refStart-refEnd will be pulled from the alignment

file. (default: False)

--referenceWindowsFile REFERENCEWINDOWSASSTRING, -W REFERENCEWINDOWSASSTRING

A file containing reference window designations, one

per line (default: None)

--barcode _BARCODE Only process reads with the given barcode name.

(default: None)

--readStratum READSTRATUM

A string of the form 'n/N', where n, and N are

integers, 0 <= n < N, designating that the reads are

to be deterministically split into N strata of roughly

even size, and stratum n is to be used for variant and

consensus calling. This is mostly useful for Quiver

development. (default: None)

--minReadScore MINREADSCORE

The minimum ReadScore for reads that will be used for

analysis (arrow-only). (default: 0.65)

--minSnr MINHQREGIONSNR

The minimum acceptable signal-to-noise over all

channels for reads that will be used for analysis

(arrow-only). (default: 2.5)

--minZScore MINZSCORE

The minimum acceptable z-score for reads that will be

used for analysis (arrow-only). (default: -3.5)

--minAccuracy MINACCURACY

The minimum acceptable window-global alignment

accuracy for reads that will be used for the analysis

(arrow-only). (default: 0.82)

Algorithm and parameter settings:

--algorithm {quiver,arrow,plurality,poa,best}

--parametersFile PARAMETERSFILE, -P PARAMETERSFILE

Parameter set filename (such as ArrowParameters.json

or QuiverParameters.ini), or directory D such that

either D/*/GenomicConsensus/QuiverParameters.ini, or

D/GenomicConsensus/QuiverParameters.ini, is found. In

the former case, the lexically largest path is chosen.

(default: None)

--parametersSpec PARAMETERSSPEC, -p PARAMETERSSPEC

Name of parameter set (chemistry.model) to select from

the parameters file, or just the name of the

chemistry, in which case the best available model is

chosen. Default is 'auto', which selects the best

parameter set from the alignment data (default: auto)

--maskRadius MASKRADIUS

Radius of window to use when excluding local regions

for exceeding maskMinErrorRate, where 0 disables any

filtering (arrow-only). (default: 3)

--maskErrorRate MASKERRORRATE

Maximum local error rate before the local region

defined by maskRadius is excluded from polishing

(arrow-only). (default: 0.7)

Verbosity and debugging/profiling:

--pdb Enable Python debugger (default: False)

--notrace Suppress stacktrace for exceptions (to simplify

testing) (default: False)

--pdbAtStartup Drop into Python debugger at startup (requires ipdb)

(default: False)

--profile Enable Python-level profiling (using cProfile).

(default: False)

--annotateGFF Augment GFF variant records with additional

information (default: False)

--reportEffectiveCoverage

Additionally record the *post-filtering* coverage at

variant sites (default: False)

Advanced configuration options:

--diploid Enable detection of heterozygous variants

(experimental) (default: False)

--queueSize QUEUESIZE, -Q QUEUESIZE

--threaded, -T Run threads instead of processes (for debugging

purposes only) (default: False)

--referenceChunkSize REFERENCECHUNKSIZE, -C REFERENCECHUNKSIZE

--fancyChunking Adaptive reference chunking designed to handle

coverage cutouts better (default: True)

--simpleChunking Disable adaptive reference chunking (default: True)

--referenceChunkOverlap REFERENCECHUNKOVERLAP

--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE

Disable the HDF5 chunk cache when the number of

datasets in the cmp.h5 exceeds the given threshold

(default: 500)

--aligner {affine,simple}, -a {affine,simple}

The pairwise alignment algorithm that will be used to

produce variant calls from the consensus (Quiver

only). (default: affine)

--refineDinucleotideRepeats

Require quiver maximum likelihood search to try one

less/more repeat copy in dinucleotide repeats, which

seem to be the most frequent cause of suboptimal

convergence (getting trapped in local optimum) (Quiver

only) (default: True)

--noRefineDinucleotideRepeats

Disable dinucleotide refinement (default: True)

--fast Cut some corners to run faster. Unsupported! (default:

False)

--skipUnrecognizedContigs

Do not abort when told to process a reference window

(via -w/--referenceWindow[s]) that has no aligned

coverage. Outputs emptyish files if there are no

remaining non-degenerate windows. Only intended for

use by smrtpipe scatter/gather. (default: False)

実行方法

マッピング済みのbamとリファレンスを指定する。ここでは8スレッド指定した。

quiver -j8 aligned_reads{.bam 
> -r reference{.fasta or .xml} 
-o variants.gff -o consensus.fasta -o consensus.fastq

-j The number of worker processes to be used (default: 1)

variantCaller --algorithm=quiver | arrow | pluralityでもランできます。

出力（Githubより）

A consensus FASTA file containing the consensus sequence
A consensus FASTQ file containing the consensus sequence with quality annotations
A variants GFF file containing a filtered, annotated list of variants identified

引用

ref.1

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
Chen-Shan Chin, David H Alexander, Patrick Marks, Aaron A Klammer, James Drake, Cheryl Heiner, Alicia Clum, Alex Copeland, John Huddleston, Evan E Eichler, Stephen W Turner & Jonas Korlach
Nature Methods volume 10, pages 563–569 (2013)

参考

Polishing PacBio assemblies with Arrow and Pilon

https://flowersoftheocean.wordpress.com/2018/04/16/polishing-pacbio-assemblies-with-arrow-and-pilon/

Quiver と Arrowの話

https://pacbiobrothers.blogspot.com/2016/12/quver-arrow.html