macでインフォマティクス

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

Pacbioのpolishingツール Quiver / ArrowとバリアントコーラーPlurality

 

 Quiverは、Pacbioがテンプレートリードを前提として、最大準尤度テンプレートシーケンスを見つける、より洗練されたアルゴリズムである。 PacBioのリードは、テンプレートシーケンスを指定してリードの準尤度をスコア付けする条件付きランダムフィールドアプローチを使用してモデル化される。各リードの基本シーケンスに加えて、Quiverはbasecaller元が提供するいくつかの追加のQV共変量を使用する。これらの共変量を使用すると、各リードに関する追加情報が提供され、より正確なコンセンサスコールが可能になる。Quiverは、マッパーによって提供されるアライメント(通常はBLASR)を使用しない。ただし、マクロなレベルでリードをまとめどのようにグループ化するか決定する場合を除く。暗黙的に独自のリアライメントを実行するので、indelを含むすべてのバリアント型に非常に敏感である。

 Arrowは近い将来Quiverに取って代わることを意図した新しいモデルである。 Quiverとの主な違いは、CRFの代わりにHMMモデルを使用し、本当の可能性を計算し、さらに少数の共変量を使用することである。 Arrowに関するホワイトペーパーがもうすぐ利用可能になる予定である。

 Pluralityはシンプルなバリアンコールアルゴリズムである。それはアライメントされたリード(BLASRによって生成、または代替マッピングツールで生成されたもの)をpileupし、各ポジションで、リファレンスベースで最も豊富な(つまり複数の)塩基をコンセンサスとしてコールする。

 

 

矢筒(Quiver)と矢(Arrow)のセットです。

  • How to install and use GenomicConsensus 

GenomicConsensus/HowTo.rst at develop · PacificBiosciences/GenomicConsensus · GitHub

  • Quiver FAQ

https://github.com/PacificBiosciences/GenomicConsensus/blob/develop/doc/FAQ.rst

  • Quiver is described in detail in the supplementary material to the HGAP paper(ref.1).

関連ツイート

 

インストール

ubuntu16.04のPython 3.6.8環境でテストした(ホストOS macos10.14)。

 

Github

GenomicConsensus

PacBio Secondary Analysis Tools on Bioconda(Bicondaに対応したPacbioのオフィシャルツール)

#condaで導入
conda install -c bioconda genomicconsensus

> quiver -h

$ quiver -h

usage: variantCaller [-h] [--version] [--emit-tool-contract]

                     [--resolved-tool-contract RESOLVED_TOOL_CONTRACT]

                     [--log-file LOG_FILE]

                     [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | -v]

                     --referenceFilename REFERENCEFILENAME -o OUTPUTFILENAMES

                     [-j NUMWORKERS] [--minConfidence MINCONFIDENCE]

                     [--minCoverage MINCOVERAGE]

                     [--noEvidenceConsensusCall {nocall,reference,lowercasereference}]

                     [--coverage COVERAGE] [--minMapQV MINMAPQV]

                     [--referenceWindow REFERENCEWINDOWSASSTRING]

                     [--alignmentSetRefWindows]

                     [--referenceWindowsFile REFERENCEWINDOWSASSTRING]

                     [--barcode _BARCODE] [--readStratum READSTRATUM]

                     [--minReadScore MINREADSCORE] [--minSnr MINHQREGIONSNR]

                     [--minZScore MINZSCORE] [--minAccuracy MINACCURACY]

                     [--algorithm {quiver,arrow,plurality,poa,best}]

                     [--parametersFile PARAMETERSFILE]

                     [--parametersSpec PARAMETERSSPEC]

                     [--maskRadius MASKRADIUS] [--maskErrorRate MASKERRORRATE]

                     [--pdb] [--notrace] [--pdbAtStartup] [--profile]

                     [--annotateGFF] [--reportEffectiveCoverage] [--diploid]

                     [--queueSize QUEUESIZE] [--threaded]

                     [--referenceChunkSize REFERENCECHUNKSIZE]

                     [--fancyChunking] [--simpleChunking]

                     [--referenceChunkOverlap REFERENCECHUNKOVERLAP]

                     [--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE]

                     [--aligner {affine,simple}] [--refineDinucleotideRepeats]

                     [--noRefineDinucleotideRepeats] [--fast]

                     [--skipUnrecognizedContigs]

                     inputFilename

 

Compute genomic consensus and call variants relative to the reference.

 

optional arguments:

  -h, --help            show this help message and exit

  --version             show program's version number and exit

  --emit-tool-contract  Emit Tool Contract to stdout (default: False)

  --resolved-tool-contract RESOLVED_TOOL_CONTRACT

                        Run Tool directly from a PacBio Resolved tool contract

                        (default: None)

  --log-file LOG_FILE   Write the log to file. Default(None) will write to

                        stdout. (default: None)

  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

                        Set log level (default: WARN)

  --debug               Alias for setting log level to DEBUG (default: False)

  --quiet               Alias for setting log level to CRITICAL to suppress

                        output. (default: False)

  -v, --verbose         Set the verbosity level. (default: None)

 

Basic required options:

  inputFilename         The input cmp.h5 or BAM alignment file

  --referenceFilename REFERENCEFILENAME, --reference REFERENCEFILENAME, -r REFERENCEFILENAME

                        The filename of the reference FASTA file (default:

                        None)

  -o OUTPUTFILENAMES, --outputFilename OUTPUTFILENAMES

                        The output filename(s), as a comma-separated

                        list.Valid output formats are .fa/.fasta, .fq/.fastq,

                        .gff, .vcf (default: )

 

Parallelism:

  -j NUMWORKERS, --numWorkers NUMWORKERS

                        The number of worker processes to be used (default: 1)

 

Output filtering:

  --minConfidence MINCONFIDENCE, -q MINCONFIDENCE

                        The minimum confidence for a variant call to be output

                        to variants.{gff,vcf} (default: 40)

  --minCoverage MINCOVERAGE, -x MINCOVERAGE

                        The minimum site coverage that must be achieved for

                        variant calls and consensus to be calculated for a

                        site. (default: 5)

  --noEvidenceConsensusCall {nocall,reference,lowercasereference}

                        The consensus base that will be output for sites with

                        no effective coverage. (default: lowercasereference)

 

Read selection/filtering:

  --coverage COVERAGE, -X COVERAGE

                        A designation of the maximum coverage level to be used

                        for analysis. Exact interpretation is algorithm-

                        specific. (default: 100)

  --minMapQV MINMAPQV, -m MINMAPQV

                        The minimum MapQV for reads that will be used for

                        analysis. (default: 10)

  --referenceWindow REFERENCEWINDOWSASSTRING, --referenceWindows REFERENCEWINDOWSASSTRING, -w REFERENCEWINDOWSASSTRING

                        The window (or multiple comma-delimited windows) of

                        the reference to be processed, in the format refGroup

                        :refStart-refEnd (default: entire reference).

                        (default: None)

  --alignmentSetRefWindows

                        The window (or multiple comma-delimited windows) of

                        the reference to be processed, in the format refGroup

                        :refStart-refEnd will be pulled from the alignment

                        file. (default: False)

  --referenceWindowsFile REFERENCEWINDOWSASSTRING, -W REFERENCEWINDOWSASSTRING

                        A file containing reference window designations, one

                        per line (default: None)

  --barcode _BARCODE    Only process reads with the given barcode name.

                        (default: None)

  --readStratum READSTRATUM

                        A string of the form 'n/N', where n, and N are

                        integers, 0 <= n < N, designating that the reads are

                        to be deterministically split into N strata of roughly

                        even size, and stratum n is to be used for variant and

                        consensus calling. This is mostly useful for Quiver

                        development. (default: None)

  --minReadScore MINREADSCORE

                        The minimum ReadScore for reads that will be used for

                        analysis (arrow-only). (default: 0.65)

  --minSnr MINHQREGIONSNR

                        The minimum acceptable signal-to-noise over all

                        channels for reads that will be used for analysis

                        (arrow-only). (default: 2.5)

  --minZScore MINZSCORE

                        The minimum acceptable z-score for reads that will be

                        used for analysis (arrow-only). (default: -3.5)

  --minAccuracy MINACCURACY

                        The minimum acceptable window-global alignment

                        accuracy for reads that will be used for the analysis

                        (arrow-only). (default: 0.82)

 

Algorithm and parameter settings:

  --algorithm {quiver,arrow,plurality,poa,best}

  --parametersFile PARAMETERSFILE, -P PARAMETERSFILE

                        Parameter set filename (such as ArrowParameters.json

                        or QuiverParameters.ini), or directory D such that

                        either D/*/GenomicConsensus/QuiverParameters.ini, or

                        D/GenomicConsensus/QuiverParameters.ini, is found. In

                        the former case, the lexically largest path is chosen.

                        (default: None)

  --parametersSpec PARAMETERSSPEC, -p PARAMETERSSPEC

                        Name of parameter set (chemistry.model) to select from

                        the parameters file, or just the name of the

                        chemistry, in which case the best available model is

                        chosen. Default is 'auto', which selects the best

                        parameter set from the alignment data (default: auto)

  --maskRadius MASKRADIUS

                        Radius of window to use when excluding local regions

                        for exceeding maskMinErrorRate, where 0 disables any

                        filtering (arrow-only). (default: 3)

  --maskErrorRate MASKERRORRATE

                        Maximum local error rate before the local region

                        defined by maskRadius is excluded from polishing

                        (arrow-only). (default: 0.7)

 

Verbosity and debugging/profiling:

  --pdb                 Enable Python debugger (default: False)

  --notrace             Suppress stacktrace for exceptions (to simplify

                        testing) (default: False)

  --pdbAtStartup        Drop into Python debugger at startup (requires ipdb)

                        (default: False)

  --profile             Enable Python-level profiling (using cProfile).

                        (default: False)

  --annotateGFF         Augment GFF variant records with additional

                        information (default: False)

  --reportEffectiveCoverage

                        Additionally record the *post-filtering* coverage at

                        variant sites (default: False)

 

Advanced configuration options:

  --diploid             Enable detection of heterozygous variants

                        (experimental) (default: False)

  --queueSize QUEUESIZE, -Q QUEUESIZE

  --threaded, -T        Run threads instead of processes (for debugging

                        purposes only) (default: False)

  --referenceChunkSize REFERENCECHUNKSIZE, -C REFERENCECHUNKSIZE

  --fancyChunking       Adaptive reference chunking designed to handle

                        coverage cutouts better (default: True)

  --simpleChunking      Disable adaptive reference chunking (default: True)

  --referenceChunkOverlap REFERENCECHUNKOVERLAP

  --autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE

                        Disable the HDF5 chunk cache when the number of

                        datasets in the cmp.h5 exceeds the given threshold

                        (default: 500)

  --aligner {affine,simple}, -a {affine,simple}

                        The pairwise alignment algorithm that will be used to

                        produce variant calls from the consensus (Quiver

                        only). (default: affine)

  --refineDinucleotideRepeats

                        Require quiver maximum likelihood search to try one

                        less/more repeat copy in dinucleotide repeats, which

                        seem to be the most frequent cause of suboptimal

                        convergence (getting trapped in local optimum) (Quiver

                        only) (default: True)

  --noRefineDinucleotideRepeats

                        Disable dinucleotide refinement (default: True)

  --fast                Cut some corners to run faster. Unsupported! (default:

                        False)

  --skipUnrecognizedContigs

                        Do not abort when told to process a reference window

                        (via -w/--referenceWindow[s]) that has no aligned

                        coverage. Outputs emptyish files if there are no

                        remaining non-degenerate windows. Only intended for

                        use by smrtpipe scatter/gather. (default: False)

> arrow -h

$ arrow -h

usage: variantCaller [-h] [--version] [--emit-tool-contract]

                     [--resolved-tool-contract RESOLVED_TOOL_CONTRACT]

                     [--log-file LOG_FILE]

                     [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | -v]

                     --referenceFilename REFERENCEFILENAME -o OUTPUTFILENAMES

                     [-j NUMWORKERS] [--minConfidence MINCONFIDENCE]

                     [--minCoverage MINCOVERAGE]

                     [--noEvidenceConsensusCall {nocall,reference,lowercasereference}]

                     [--coverage COVERAGE] [--minMapQV MINMAPQV]

                     [--referenceWindow REFERENCEWINDOWSASSTRING]

                     [--alignmentSetRefWindows]

                     [--referenceWindowsFile REFERENCEWINDOWSASSTRING]

                     [--barcode _BARCODE] [--readStratum READSTRATUM]

                     [--minReadScore MINREADSCORE] [--minSnr MINHQREGIONSNR]

                     [--minZScore MINZSCORE] [--minAccuracy MINACCURACY]

                     [--algorithm {quiver,arrow,plurality,poa,best}]

                     [--parametersFile PARAMETERSFILE]

                     [--parametersSpec PARAMETERSSPEC]

                     [--maskRadius MASKRADIUS] [--maskErrorRate MASKERRORRATE]

                     [--pdb] [--notrace] [--pdbAtStartup] [--profile]

                     [--annotateGFF] [--reportEffectiveCoverage] [--diploid]

                     [--queueSize QUEUESIZE] [--threaded]

                     [--referenceChunkSize REFERENCECHUNKSIZE]

                     [--fancyChunking] [--simpleChunking]

                     [--referenceChunkOverlap REFERENCECHUNKOVERLAP]

                     [--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE]

                     [--aligner {affine,simple}] [--refineDinucleotideRepeats]

                     [--noRefineDinucleotideRepeats] [--fast]

                     [--skipUnrecognizedContigs]

                     inputFilename

 

Compute genomic consensus and call variants relative to the reference.

 

optional arguments:

  -h, --help            show this help message and exit

  --version             show program's version number and exit

  --emit-tool-contract  Emit Tool Contract to stdout (default: False)

  --resolved-tool-contract RESOLVED_TOOL_CONTRACT

                        Run Tool directly from a PacBio Resolved tool contract

                        (default: None)

  --log-file LOG_FILE   Write the log to file. Default(None) will write to

                        stdout. (default: None)

  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

                        Set log level (default: WARN)

  --debug               Alias for setting log level to DEBUG (default: False)

  --quiet               Alias for setting log level to CRITICAL to suppress

                        output. (default: False)

  -v, --verbose         Set the verbosity level. (default: None)

 

Basic required options:

  inputFilename         The input cmp.h5 or BAM alignment file

  --referenceFilename REFERENCEFILENAME, --reference REFERENCEFILENAME, -r REFERENCEFILENAME

                        The filename of the reference FASTA file (default:

                        None)

  -o OUTPUTFILENAMES, --outputFilename OUTPUTFILENAMES

                        The output filename(s), as a comma-separated

                        list.Valid output formats are .fa/.fasta, .fq/.fastq,

                        .gff, .vcf (default: )

 

Parallelism:

  -j NUMWORKERS, --numWorkers NUMWORKERS

                        The number of worker processes to be used (default: 1)

 

Output filtering:

  --minConfidence MINCONFIDENCE, -q MINCONFIDENCE

                        The minimum confidence for a variant call to be output

                        to variants.{gff,vcf} (default: 40)

  --minCoverage MINCOVERAGE, -x MINCOVERAGE

                        The minimum site coverage that must be achieved for

                        variant calls and consensus to be calculated for a

                        site. (default: 5)

  --noEvidenceConsensusCall {nocall,reference,lowercasereference}

                        The consensus base that will be output for sites with

                        no effective coverage. (default: lowercasereference)

 

Read selection/filtering:

  --coverage COVERAGE, -X COVERAGE

                        A designation of the maximum coverage level to be used

                        for analysis. Exact interpretation is algorithm-

                        specific. (default: 100)

  --minMapQV MINMAPQV, -m MINMAPQV

                        The minimum MapQV for reads that will be used for

                        analysis. (default: 10)

  --referenceWindow REFERENCEWINDOWSASSTRING, --referenceWindows REFERENCEWINDOWSASSTRING, -w REFERENCEWINDOWSASSTRING

                        The window (or multiple comma-delimited windows) of

                        the reference to be processed, in the format refGroup

                        :refStart-refEnd (default: entire reference).

                        (default: None)

  --alignmentSetRefWindows

                        The window (or multiple comma-delimited windows) of

                        the reference to be processed, in the format refGroup

                        :refStart-refEnd will be pulled from the alignment

                        file. (default: False)

  --referenceWindowsFile REFERENCEWINDOWSASSTRING, -W REFERENCEWINDOWSASSTRING

                        A file containing reference window designations, one

                        per line (default: None)

  --barcode _BARCODE    Only process reads with the given barcode name.

                        (default: None)

  --readStratum READSTRATUM

                        A string of the form 'n/N', where n, and N are

                        integers, 0 <= n < N, designating that the reads are

                        to be deterministically split into N strata of roughly

                        even size, and stratum n is to be used for variant and

                        consensus calling. This is mostly useful for Quiver

                        development. (default: None)

  --minReadScore MINREADSCORE

                        The minimum ReadScore for reads that will be used for

                        analysis (arrow-only). (default: 0.65)

  --minSnr MINHQREGIONSNR

                        The minimum acceptable signal-to-noise over all

                        channels for reads that will be used for analysis

                        (arrow-only). (default: 2.5)

  --minZScore MINZSCORE

                        The minimum acceptable z-score for reads that will be

                        used for analysis (arrow-only). (default: -3.5)

  --minAccuracy MINACCURACY

                        The minimum acceptable window-global alignment

                        accuracy for reads that will be used for the analysis

                        (arrow-only). (default: 0.82)

 

Algorithm and parameter settings:

  --algorithm {quiver,arrow,plurality,poa,best}

  --parametersFile PARAMETERSFILE, -P PARAMETERSFILE

                        Parameter set filename (such as ArrowParameters.json

                        or QuiverParameters.ini), or directory D such that

                        either D/*/GenomicConsensus/QuiverParameters.ini, or

                        D/GenomicConsensus/QuiverParameters.ini, is found. In

                        the former case, the lexically largest path is chosen.

                        (default: None)

  --parametersSpec PARAMETERSSPEC, -p PARAMETERSSPEC

                        Name of parameter set (chemistry.model) to select from

                        the parameters file, or just the name of the

                        chemistry, in which case the best available model is

                        chosen. Default is 'auto', which selects the best

                        parameter set from the alignment data (default: auto)

  --maskRadius MASKRADIUS

                        Radius of window to use when excluding local regions

                        for exceeding maskMinErrorRate, where 0 disables any

                        filtering (arrow-only). (default: 3)

  --maskErrorRate MASKERRORRATE

                        Maximum local error rate before the local region

                        defined by maskRadius is excluded from polishing

                        (arrow-only). (default: 0.7)

 

Verbosity and debugging/profiling:

  --pdb                 Enable Python debugger (default: False)

  --notrace             Suppress stacktrace for exceptions (to simplify

                        testing) (default: False)

  --pdbAtStartup        Drop into Python debugger at startup (requires ipdb)

                        (default: False)

  --profile             Enable Python-level profiling (using cProfile).

                        (default: False)

  --annotateGFF         Augment GFF variant records with additional

                        information (default: False)

  --reportEffectiveCoverage

                        Additionally record the *post-filtering* coverage at

                        variant sites (default: False)

 

Advanced configuration options:

  --diploid             Enable detection of heterozygous variants

                        (experimental) (default: False)

  --queueSize QUEUESIZE, -Q QUEUESIZE

  --threaded, -T        Run threads instead of processes (for debugging

                        purposes only) (default: False)

  --referenceChunkSize REFERENCECHUNKSIZE, -C REFERENCECHUNKSIZE

  --fancyChunking       Adaptive reference chunking designed to handle

                        coverage cutouts better (default: True)

  --simpleChunking      Disable adaptive reference chunking (default: True)

  --referenceChunkOverlap REFERENCECHUNKOVERLAP

  --autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE

                        Disable the HDF5 chunk cache when the number of

                        datasets in the cmp.h5 exceeds the given threshold

                        (default: 500)

  --aligner {affine,simple}, -a {affine,simple}

                        The pairwise alignment algorithm that will be used to

                        produce variant calls from the consensus (Quiver

                        only). (default: affine)

  --refineDinucleotideRepeats

                        Require quiver maximum likelihood search to try one

                        less/more repeat copy in dinucleotide repeats, which

                        seem to be the most frequent cause of suboptimal

                        convergence (getting trapped in local optimum) (Quiver

                        only) (default: True)

  --noRefineDinucleotideRepeats

                        Disable dinucleotide refinement (default: True)

  --fast                Cut some corners to run faster. Unsupported! (default:

                        False)

  --skipUnrecognizedContigs

                        Do not abort when told to process a reference window

                        (via -w/--referenceWindow[s]) that has no aligned

                        coverage. Outputs emptyish files if there are no

                        remaining non-degenerate windows. Only intended for

                        use by smrtpipe scatter/gather. (default: False)

> plurality -h

$ plurality -h

usage: variantCaller [-h] [--version] [--emit-tool-contract]

                     [--resolved-tool-contract RESOLVED_TOOL_CONTRACT]

                     [--log-file LOG_FILE]

                     [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | -v]

                     --referenceFilename REFERENCEFILENAME -o OUTPUTFILENAMES

                     [-j NUMWORKERS] [--minConfidence MINCONFIDENCE]

                     [--minCoverage MINCOVERAGE]

                     [--noEvidenceConsensusCall {nocall,reference,lowercasereference}]

                     [--coverage COVERAGE] [--minMapQV MINMAPQV]

                     [--referenceWindow REFERENCEWINDOWSASSTRING]

                     [--alignmentSetRefWindows]

                     [--referenceWindowsFile REFERENCEWINDOWSASSTRING]

                     [--barcode _BARCODE] [--readStratum READSTRATUM]

                     [--minReadScore MINREADSCORE] [--minSnr MINHQREGIONSNR]

                     [--minZScore MINZSCORE] [--minAccuracy MINACCURACY]

                     [--algorithm {quiver,arrow,plurality,poa,best}]

                     [--parametersFile PARAMETERSFILE]

                     [--parametersSpec PARAMETERSSPEC]

                     [--maskRadius MASKRADIUS] [--maskErrorRate MASKERRORRATE]

                     [--pdb] [--notrace] [--pdbAtStartup] [--profile]

                     [--annotateGFF] [--reportEffectiveCoverage] [--diploid]

                     [--queueSize QUEUESIZE] [--threaded]

                     [--referenceChunkSize REFERENCECHUNKSIZE]

                     [--fancyChunking] [--simpleChunking]

                     [--referenceChunkOverlap REFERENCECHUNKOVERLAP]

                     [--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE]

                     [--aligner {affine,simple}] [--refineDinucleotideRepeats]

                     [--noRefineDinucleotideRepeats] [--fast]

                     [--skipUnrecognizedContigs]

                     inputFilename

 

Compute genomic consensus and call variants relative to the reference.

 

optional arguments:

  -h, --help            show this help message and exit

  --version             show program's version number and exit

  --emit-tool-contract  Emit Tool Contract to stdout (default: False)

  --resolved-tool-contract RESOLVED_TOOL_CONTRACT

                        Run Tool directly from a PacBio Resolved tool contract

                        (default: None)

  --log-file LOG_FILE   Write the log to file. Default(None) will write to

                        stdout. (default: None)

  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

                        Set log level (default: WARN)

  --debug               Alias for setting log level to DEBUG (default: False)

  --quiet               Alias for setting log level to CRITICAL to suppress

                        output. (default: False)

  -v, --verbose         Set the verbosity level. (default: None)

 

Basic required options:

  inputFilename         The input cmp.h5 or BAM alignment file

  --referenceFilename REFERENCEFILENAME, --reference REFERENCEFILENAME, -r REFERENCEFILENAME

                        The filename of the reference FASTA file (default:

                        None)

  -o OUTPUTFILENAMES, --outputFilename OUTPUTFILENAMES

                        The output filename(s), as a comma-separated

                        list.Valid output formats are .fa/.fasta, .fq/.fastq,

                        .gff, .vcf (default: )

 

Parallelism:

  -j NUMWORKERS, --numWorkers NUMWORKERS

                        The number of worker processes to be used (default: 1)

 

Output filtering:

  --minConfidence MINCONFIDENCE, -q MINCONFIDENCE

                        The minimum confidence for a variant call to be output

                        to variants.{gff,vcf} (default: 40)

  --minCoverage MINCOVERAGE, -x MINCOVERAGE

                        The minimum site coverage that must be achieved for

                        variant calls and consensus to be calculated for a

                        site. (default: 5)

  --noEvidenceConsensusCall {nocall,reference,lowercasereference}

                        The consensus base that will be output for sites with

                        no effective coverage. (default: lowercasereference)

 

Read selection/filtering:

  --coverage COVERAGE, -X COVERAGE

                        A designation of the maximum coverage level to be used

                        for analysis. Exact interpretation is algorithm-

                        specific. (default: 100)

  --minMapQV MINMAPQV, -m MINMAPQV

                        The minimum MapQV for reads that will be used for

                        analysis. (default: 10)

  --referenceWindow REFERENCEWINDOWSASSTRING, --referenceWindows REFERENCEWINDOWSASSTRING, -w REFERENCEWINDOWSASSTRING

                        The window (or multiple comma-delimited windows) of

                        the reference to be processed, in the format refGroup

                        :refStart-refEnd (default: entire reference).

                        (default: None)

  --alignmentSetRefWindows

                        The window (or multiple comma-delimited windows) of

                        the reference to be processed, in the format refGroup

                        :refStart-refEnd will be pulled from the alignment

                        file. (default: False)

  --referenceWindowsFile REFERENCEWINDOWSASSTRING, -W REFERENCEWINDOWSASSTRING

                        A file containing reference window designations, one

                        per line (default: None)

  --barcode _BARCODE    Only process reads with the given barcode name.

                        (default: None)

  --readStratum READSTRATUM

                        A string of the form 'n/N', where n, and N are

                        integers, 0 <= n < N, designating that the reads are

                        to be deterministically split into N strata of roughly

                        even size, and stratum n is to be used for variant and

                        consensus calling. This is mostly useful for Quiver

                        development. (default: None)

  --minReadScore MINREADSCORE

                        The minimum ReadScore for reads that will be used for

                        analysis (arrow-only). (default: 0.65)

  --minSnr MINHQREGIONSNR

                        The minimum acceptable signal-to-noise over all

                        channels for reads that will be used for analysis

                        (arrow-only). (default: 2.5)

  --minZScore MINZSCORE

                        The minimum acceptable z-score for reads that will be

                        used for analysis (arrow-only). (default: -3.5)

  --minAccuracy MINACCURACY

                        The minimum acceptable window-global alignment

                        accuracy for reads that will be used for the analysis

                        (arrow-only). (default: 0.82)

 

Algorithm and parameter settings:

  --algorithm {quiver,arrow,plurality,poa,best}

  --parametersFile PARAMETERSFILE, -P PARAMETERSFILE

                        Parameter set filename (such as ArrowParameters.json

                        or QuiverParameters.ini), or directory D such that

                        either D/*/GenomicConsensus/QuiverParameters.ini, or

                        D/GenomicConsensus/QuiverParameters.ini, is found. In

                        the former case, the lexically largest path is chosen.

                        (default: None)

  --parametersSpec PARAMETERSSPEC, -p PARAMETERSSPEC

                        Name of parameter set (chemistry.model) to select from

                        the parameters file, or just the name of the

                        chemistry, in which case the best available model is

                        chosen. Default is 'auto', which selects the best

                        parameter set from the alignment data (default: auto)

  --maskRadius MASKRADIUS

                        Radius of window to use when excluding local regions

                        for exceeding maskMinErrorRate, where 0 disables any

                        filtering (arrow-only). (default: 3)

  --maskErrorRate MASKERRORRATE

                        Maximum local error rate before the local region

                        defined by maskRadius is excluded from polishing

                        (arrow-only). (default: 0.7)

 

Verbosity and debugging/profiling:

  --pdb                 Enable Python debugger (default: False)

  --notrace             Suppress stacktrace for exceptions (to simplify

                        testing) (default: False)

  --pdbAtStartup        Drop into Python debugger at startup (requires ipdb)

                        (default: False)

  --profile             Enable Python-level profiling (using cProfile).

                        (default: False)

  --annotateGFF         Augment GFF variant records with additional

                        information (default: False)

  --reportEffectiveCoverage

                        Additionally record the *post-filtering* coverage at

                        variant sites (default: False)

 

Advanced configuration options:

  --diploid             Enable detection of heterozygous variants

                        (experimental) (default: False)

  --queueSize QUEUESIZE, -Q QUEUESIZE

  --threaded, -T        Run threads instead of processes (for debugging

                        purposes only) (default: False)

  --referenceChunkSize REFERENCECHUNKSIZE, -C REFERENCECHUNKSIZE

  --fancyChunking       Adaptive reference chunking designed to handle

                        coverage cutouts better (default: True)

  --simpleChunking      Disable adaptive reference chunking (default: True)

  --referenceChunkOverlap REFERENCECHUNKOVERLAP

  --autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE

                        Disable the HDF5 chunk cache when the number of

                        datasets in the cmp.h5 exceeds the given threshold

                        (default: 500)

  --aligner {affine,simple}, -a {affine,simple}

                        The pairwise alignment algorithm that will be used to

                        produce variant calls from the consensus (Quiver

                        only). (default: affine)

  --refineDinucleotideRepeats

                        Require quiver maximum likelihood search to try one

                        less/more repeat copy in dinucleotide repeats, which

                        seem to be the most frequent cause of suboptimal

                        convergence (getting trapped in local optimum) (Quiver

                        only) (default: True)

  --noRefineDinucleotideRepeats

                        Disable dinucleotide refinement (default: True)

  --fast                Cut some corners to run faster. Unsupported! (default:

                        False)

  --skipUnrecognizedContigs

                        Do not abort when told to process a reference window

                        (via -w/--referenceWindow[s]) that has no aligned

                        coverage. Outputs emptyish files if there are no

                        remaining non-degenerate windows. Only intended for

                        use by smrtpipe scatter/gather. (default: False)

> variantCaller -h

$ variantCaller -h

usage: variantCaller [-h] [--version] [--emit-tool-contract]

                     [--resolved-tool-contract RESOLVED_TOOL_CONTRACT]

                     [--log-file LOG_FILE]

                     [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | -v]

                     --referenceFilename REFERENCEFILENAME -o OUTPUTFILENAMES

                     [-j NUMWORKERS] [--minConfidence MINCONFIDENCE]

                     [--minCoverage MINCOVERAGE]

                     [--noEvidenceConsensusCall {nocall,reference,lowercasereference}]

                     [--coverage COVERAGE] [--minMapQV MINMAPQV]

                     [--referenceWindow REFERENCEWINDOWSASSTRING]

                     [--alignmentSetRefWindows]

                     [--referenceWindowsFile REFERENCEWINDOWSASSTRING]

                     [--barcode _BARCODE] [--readStratum READSTRATUM]

                     [--minReadScore MINREADSCORE] [--minSnr MINHQREGIONSNR]

                     [--minZScore MINZSCORE] [--minAccuracy MINACCURACY]

                     [--algorithm {quiver,arrow,plurality,poa,best}]

                     [--parametersFile PARAMETERSFILE]

                     [--parametersSpec PARAMETERSSPEC]

                     [--maskRadius MASKRADIUS] [--maskErrorRate MASKERRORRATE]

                     [--pdb] [--notrace] [--pdbAtStartup] [--profile]

                     [--annotateGFF] [--reportEffectiveCoverage] [--diploid]

                     [--queueSize QUEUESIZE] [--threaded]

                     [--referenceChunkSize REFERENCECHUNKSIZE]

                     [--fancyChunking] [--simpleChunking]

                     [--referenceChunkOverlap REFERENCECHUNKOVERLAP]

                     [--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE]

                     [--aligner {affine,simple}] [--refineDinucleotideRepeats]

                     [--noRefineDinucleotideRepeats] [--fast]

                     [--skipUnrecognizedContigs]

                     inputFilename

 

Compute genomic consensus and call variants relative to the reference.

 

optional arguments:

  -h, --help            show this help message and exit

  --version             show program's version number and exit

  --emit-tool-contract  Emit Tool Contract to stdout (default: False)

  --resolved-tool-contract RESOLVED_TOOL_CONTRACT

                        Run Tool directly from a PacBio Resolved tool contract

                        (default: None)

  --log-file LOG_FILE   Write the log to file. Default(None) will write to

                        stdout. (default: None)

  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}

                        Set log level (default: WARN)

  --debug               Alias for setting log level to DEBUG (default: False)

  --quiet               Alias for setting log level to CRITICAL to suppress

                        output. (default: False)

  -v, --verbose         Set the verbosity level. (default: None)

 

Basic required options:

  inputFilename         The input cmp.h5 or BAM alignment file

  --referenceFilename REFERENCEFILENAME, --reference REFERENCEFILENAME, -r REFERENCEFILENAME

                        The filename of the reference FASTA file (default:

                        None)

  -o OUTPUTFILENAMES, --outputFilename OUTPUTFILENAMES

                        The output filename(s), as a comma-separated

                        list.Valid output formats are .fa/.fasta, .fq/.fastq,

                        .gff, .vcf (default: )

 

Parallelism:

  -j NUMWORKERS, --numWorkers NUMWORKERS

                        The number of worker processes to be used (default: 1)

 

Output filtering:

  --minConfidence MINCONFIDENCE, -q MINCONFIDENCE

                        The minimum confidence for a variant call to be output

                        to variants.{gff,vcf} (default: 40)

  --minCoverage MINCOVERAGE, -x MINCOVERAGE

                        The minimum site coverage that must be achieved for

                        variant calls and consensus to be calculated for a

                        site. (default: 5)

  --noEvidenceConsensusCall {nocall,reference,lowercasereference}

                        The consensus base that will be output for sites with

                        no effective coverage. (default: lowercasereference)

 

Read selection/filtering:

  --coverage COVERAGE, -X COVERAGE

                        A designation of the maximum coverage level to be used

                        for analysis. Exact interpretation is algorithm-

                        specific. (default: 100)

  --minMapQV MINMAPQV, -m MINMAPQV

                        The minimum MapQV for reads that will be used for

                        analysis. (default: 10)

  --referenceWindow REFERENCEWINDOWSASSTRING, --referenceWindows REFERENCEWINDOWSASSTRING, -w REFERENCEWINDOWSASSTRING

                        The window (or multiple comma-delimited windows) of

                        the reference to be processed, in the format refGroup

                        :refStart-refEnd (default: entire reference).

                        (default: None)

  --alignmentSetRefWindows

                        The window (or multiple comma-delimited windows) of

                        the reference to be processed, in the format refGroup

                        :refStart-refEnd will be pulled from the alignment

                        file. (default: False)

  --referenceWindowsFile REFERENCEWINDOWSASSTRING, -W REFERENCEWINDOWSASSTRING

                        A file containing reference window designations, one

                        per line (default: None)

  --barcode _BARCODE    Only process reads with the given barcode name.

                        (default: None)

  --readStratum READSTRATUM

                        A string of the form 'n/N', where n, and N are

                        integers, 0 <= n < N, designating that the reads are

                        to be deterministically split into N strata of roughly

                        even size, and stratum n is to be used for variant and

                        consensus calling. This is mostly useful for Quiver

                        development. (default: None)

  --minReadScore MINREADSCORE

                        The minimum ReadScore for reads that will be used for

                        analysis (arrow-only). (default: 0.65)

  --minSnr MINHQREGIONSNR

                        The minimum acceptable signal-to-noise over all

                        channels for reads that will be used for analysis

                        (arrow-only). (default: 2.5)

  --minZScore MINZSCORE

                        The minimum acceptable z-score for reads that will be

                        used for analysis (arrow-only). (default: -3.5)

  --minAccuracy MINACCURACY

                        The minimum acceptable window-global alignment

                        accuracy for reads that will be used for the analysis

                        (arrow-only). (default: 0.82)

 

Algorithm and parameter settings:

  --algorithm {quiver,arrow,plurality,poa,best}

  --parametersFile PARAMETERSFILE, -P PARAMETERSFILE

                        Parameter set filename (such as ArrowParameters.json

                        or QuiverParameters.ini), or directory D such that

                        either D/*/GenomicConsensus/QuiverParameters.ini, or

                        D/GenomicConsensus/QuiverParameters.ini, is found. In

                        the former case, the lexically largest path is chosen.

                        (default: None)

  --parametersSpec PARAMETERSSPEC, -p PARAMETERSSPEC

                        Name of parameter set (chemistry.model) to select from

                        the parameters file, or just the name of the

                        chemistry, in which case the best available model is

                        chosen. Default is 'auto', which selects the best

                        parameter set from the alignment data (default: auto)

  --maskRadius MASKRADIUS

                        Radius of window to use when excluding local regions

                        for exceeding maskMinErrorRate, where 0 disables any

                        filtering (arrow-only). (default: 3)

  --maskErrorRate MASKERRORRATE

                        Maximum local error rate before the local region

                        defined by maskRadius is excluded from polishing

                        (arrow-only). (default: 0.7)

 

Verbosity and debugging/profiling:

  --pdb                 Enable Python debugger (default: False)

  --notrace             Suppress stacktrace for exceptions (to simplify

                        testing) (default: False)

  --pdbAtStartup        Drop into Python debugger at startup (requires ipdb)

                        (default: False)

  --profile             Enable Python-level profiling (using cProfile).

                        (default: False)

  --annotateGFF         Augment GFF variant records with additional

                        information (default: False)

  --reportEffectiveCoverage

                        Additionally record the *post-filtering* coverage at

                        variant sites (default: False)

 

Advanced configuration options:

  --diploid             Enable detection of heterozygous variants

                        (experimental) (default: False)

  --queueSize QUEUESIZE, -Q QUEUESIZE

  --threaded, -T        Run threads instead of processes (for debugging

                        purposes only) (default: False)

  --referenceChunkSize REFERENCECHUNKSIZE, -C REFERENCECHUNKSIZE

  --fancyChunking       Adaptive reference chunking designed to handle

                        coverage cutouts better (default: True)

  --simpleChunking      Disable adaptive reference chunking (default: True)

  --referenceChunkOverlap REFERENCECHUNKOVERLAP

  --autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE

                        Disable the HDF5 chunk cache when the number of

                        datasets in the cmp.h5 exceeds the given threshold

                        (default: 500)

  --aligner {affine,simple}, -a {affine,simple}

                        The pairwise alignment algorithm that will be used to

                        produce variant calls from the consensus (Quiver

                        only). (default: affine)

  --refineDinucleotideRepeats

                        Require quiver maximum likelihood search to try one

                        less/more repeat copy in dinucleotide repeats, which

                        seem to be the most frequent cause of suboptimal

                        convergence (getting trapped in local optimum) (Quiver

                        only) (default: True)

  --noRefineDinucleotideRepeats

                        Disable dinucleotide refinement (default: True)

  --fast                Cut some corners to run faster. Unsupported! (default:

                        False)

  --skipUnrecognizedContigs

                        Do not abort when told to process a reference window

                        (via -w/--referenceWindow[s]) that has no aligned

                        coverage. Outputs emptyish files if there are no

                        remaining non-degenerate windows. Only intended for

                        use by smrtpipe scatter/gather. (default: False)

 

実行方法 

マッピング済みのbamとリファレンスを指定する。ここでは8スレッド指定した。

quiver -j8 aligned_reads{.bam 
> -r reference{.fasta or .xml}
-o variants.gff -o consensus.fasta -o consensus.fastq
  •   -j    The number of worker processes to be used (default: 1)

variantCaller --algorithm=quiver arrow | pluralityでもランできます。

 

出力(Githubより)

  1. A consensus FASTA file containing the consensus sequence
  2. A consensus FASTQ file containing the consensus sequence with quality annotations
  3. A variants GFF file containing a filtered, annotated list of variants identified

引用

ref.1

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
Chen-Shan Chin, David H Alexander, Patrick Marks, Aaron A Klammer, James Drake, Cheryl Heiner, Alicia Clum, Alex Copeland, John Huddleston, Evan E Eichler, Stephen W Turner & Jonas Korlach
Nature Methods volume 10, pages 563–569 (2013)

 

参考

Polishing PacBio assemblies with Arrow and Pilon

https://flowersoftheocean.wordpress.com/2018/04/16/polishing-pacbio-assemblies-with-arrow-and-pilon/

 

Quiver と Arrowの話

https://pacbiobrothers.blogspot.com/2016/12/quver-arrow.html

 

関連