2019 5/30 インストール追記
InDelFixerは454、Illumina、およびPacBioデータ用の高感度なアライナーである。完全なSmith-Watermanアライメントを採用している。事前の高速k-merマッチングによって次世代シーケンス(NGS)および第3世代のリードを一連のリファレンスシーケンスにアライメントする。
インストール
mac os10.14のjava1.8環境でjarファイルをダウンロードしてテストした。
リリースから.jarファイルをダウンロードする。condaでも導入できる。
#bioconda (link)
conda install -c bioconda indelfixer
> java -jar InDelFixer.jar
$ java -jar InDelFixer.jar
InDelFixer version: 1.1
Get latest version from http://bit.ly/indelfixer
USAGE: java -jar InDelFixer.jar options...
------------------------
=== GENERAL options ===
-o PATH : Path to the output directory (default: current directory).
-i PATH : Path to the NGS input file (FASTA, FASTQ or SFF format) [REQUIRED].
-ir PATH : Path to the second paired end file (FASTQ) [ONLY REQUIRED if first file is also fastq].
-g PATH : Path to the reference genomes file (FASTA format) [REQUIRED].
-r interval : Region on the reference genome (i.e. 342-944).
-k INT : Kmer size (default 10).
-v INT : Kmer offset (default 2).
-cut INT : Cut given number of bases (primer) at 5' and 3' (default 0).
-refine INT : Computes a consensus sequence from alignment and re-aligns against that.
Refinement is repeated as many times as specified.
-mcc INT : Minimal coverage to replace a reference base in the consensus (default 1).
-rmDel : Removes conserved gaps from consensus sequence during refinement.
-sensitive : More sensitive but slower alignment.
-fix : Fill frame-shift causing deletions with consensus sequence.
-noHashing : No fast kmer-matching to find approximate mapping region. Please use with PacBio data!
-realign DOUBLE : Reads are aligned to the whole reference sequence,
if the relative mismatch rate is above the given threshold (default 0.1).
=== FILTER ===
-l INT : Minimal read-length prior alignment (default 0).
-la INT : Minimal read-length after alignment (default 0).
-ins DOUBLE : The maximum percentage of insertions allowed [range 0.0 - 1.0] (default 1.0).
-del DOUBLE : The maximum percentage of deletions allowed [range 0.0 - 1.0] (default 1.0).
-sub DOUBLE : The maximum percentage of substitutions allowed [range 0.0 - 1.0] (default 0.5).
-maxDel INT : The maximum number of consecutive deletions allowed (default no filtering).
-q INT : Minimal average Phred score of the aligned read (default 20).
=== GAP costs ===
-gop : Gap opening costs for Smith-Waterman (default 30).
-gex : Gap extension costs for Smith-Waterman (default 3).
=== GAP costs predefined ===
-454 : 10 open / 1 extend
-illumina : 30 open / 3 extend
-pacbio : 5 open / 3 extend
------------------------
=== EXAMPLES ===
454/Roche : java -jar InDelFixer.jar -i libCase102.fastq -g referenceGenomes.fasta -454
PacBio : java -jar InDelFixer.jar -i libCase102.ccs.fastq -g referenceGenomes.fasta -noHashing -pacbio
Illumina : java -jar InDelFixer.jar -i libCase102_R1.fastq -ir libCase102_R2.fastq -g referenceGenomes.fasta -illumina
------------------------
インストール
ペアエンドfastq
java -jar InDelFixer.jar -i pair_R1.fastq -ir pair_R2.fastq -g ref.fasta
- -o Path to the output directory (default: current directory).
- -i Path to the NGS input file (FASTA, FASTQ or SFF format) [REQUIRED].
- -ir Path to the second paired end file (FASTQ) [ONLY REQUIRED if first file is also fastq].
- -g Path to the reference genomes file (FASTA format) [REQUIRED].
pacbioのccs.fasta
java -jar InDelFixer.jar -i libCase102.fasta -g ref.fasta -noHashing
引用
https://github.com/cbg-ethz/InDelFixer
関連