Illumina、454、およびPacBioのSmith-Watermanアライメントによる高感度なアライナー InDelFixer

2019 5/30 インストール追記

InDelFixerは454、Illumina、およびPacBioデータ用の高感度なアライナーである。完全なSmith-Watermanアライメントを採用している。事前の高速k-merマッチングによって次世代シーケンス（NGS）および第3世代のリードを一連のリファレンスシーケンスにアライメントする。

インストール

mac os10.14のjava1.8環境でjarファイルをダウンロードしてテストした。

Github

リリースから.jarファイルをダウンロードする。condaでも導入できる。

#bioconda (link)
conda install -c bioconda indelfixer

> java -jar InDelFixer.jar

$ java -jar InDelFixer.jar

InDelFixer version: 1.1

Get latest version from http://bit.ly/indelfixer

USAGE: java -jar InDelFixer.jar options...

------------------------

=== GENERAL options ===

-o PATH : Path to the output directory (default: current directory).

-i PATH : Path to the NGS input file (FASTA, FASTQ or SFF format) [REQUIRED].

-ir PATH : Path to the second paired end file (FASTQ) [ONLY REQUIRED if first file is also fastq].

-g PATH : Path to the reference genomes file (FASTA format) [REQUIRED].

-r interval : Region on the reference genome (i.e. 342-944).

-k INT : Kmer size (default 10).

-v INT : Kmer offset (default 2).

-cut INT : Cut given number of bases (primer) at 5' and 3' (default 0).

-refine INT : Computes a consensus sequence from alignment and re-aligns against that.

Refinement is repeated as many times as specified.

-mcc INT : Minimal coverage to replace a reference base in the consensus (default 1).

-rmDel : Removes conserved gaps from consensus sequence during refinement.

-sensitive : More sensitive but slower alignment.

-fix : Fill frame-shift causing deletions with consensus sequence.

-noHashing : No fast kmer-matching to find approximate mapping region. Please use with PacBio data!

-realign DOUBLE : Reads are aligned to the whole reference sequence,

if the relative mismatch rate is above the given threshold (default 0.1).

=== FILTER ===

-l INT : Minimal read-length prior alignment (default 0).

-la INT : Minimal read-length after alignment (default 0).

-ins DOUBLE : The maximum percentage of insertions allowed [range 0.0 - 1.0] (default 1.0).

-del DOUBLE : The maximum percentage of deletions allowed [range 0.0 - 1.0] (default 1.0).

-sub DOUBLE : The maximum percentage of substitutions allowed [range 0.0 - 1.0] (default 0.5).

-maxDel INT : The maximum number of consecutive deletions allowed (default no filtering).

-q INT : Minimal average Phred score of the aligned read (default 20).

=== GAP costs ===

-gop : Gap opening costs for Smith-Waterman (default 30).

-gex : Gap extension costs for Smith-Waterman (default 3).

=== GAP costs predefined ===

-454 : 10 open / 1 extend

-illumina : 30 open / 3 extend

-pacbio : 5 open / 3 extend

------------------------

=== EXAMPLES ===

454/Roche : java -jar InDelFixer.jar -i libCase102.fastq -g referenceGenomes.fasta -454

PacBio : java -jar InDelFixer.jar -i libCase102.ccs.fastq -g referenceGenomes.fasta -noHashing -pacbio

Illumina : java -jar InDelFixer.jar -i libCase102_R1.fastq -ir libCase102_R2.fastq -g referenceGenomes.fasta -illumina

------------------------

インストール

ペアエンドfastq

java -jar InDelFixer.jar -i pair_R1.fastq -ir pair_R2.fastq -g ref.fasta

-o Path to the output directory (default: current directory).
-i Path to the NGS input file (FASTA, FASTQ or SFF format) [REQUIRED].
-ir Path to the second paired end file (FASTQ) [ONLY REQUIRED if first file is also fastq].
-g Path to the reference genomes file (FASTA format) [REQUIRED].

pacbioのccs.fasta

java -jar InDelFixer.jar -i libCase102.fasta -g ref.fasta -noHashing

引用

https://github.com/cbg-ethz/InDelFixer

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

Illumina、454、およびPacBioのSmith-Watermanアライメントによる高感度なアライナー InDelFixer