2020 5/14 help追記
2021 1/23 condaによるインストール追記
マルチプルアライメントを行うとアライメントがほとんどできない領域ができることがあるが、そういった領域は情報として利用するのが難しいため、一般的に除去しても問題にならない。trimAIはラージスケールにも対応したマルチプルアライメントのトリミングツールで、何千もの配列のマルチプルアライメント出力からアライメントが貧弱な領域を除去することができる。入力できるのはPhylip、Fasta、Clustal、NBRF/Pir、Mega、Nexusなどになる。
マニュアル
http://trimal.cgenomics.org/use_of_the_command_line_trimal_v1.2
http://trimal.cgenomics.org/_media/manual.b.pdf
インストール
Download
http://trimal.cgenomics.org/downloads
ダウンロードしたディレクトリを解凍してビルドする。
git clone https://github.com/scapella/trimal.git
cd trimal/source/
make -j
#conda
mamba install -c bioconda trimal
mamba install -c bioconda/label/cf201901 trimal
> ./trimal
trimAl v1.4.rev22 build[2015-05-21]. 2009-2015. Salvador Capella-Gutierrez and Toni Gabaldón.
trimAl webpage: http://trimal.cgenomics.org
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, the last available version.
Please cite:
trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.
Salvador Capella-Gutierrez; Jose M. Silla-Martinez; Toni Gabaldon.
Bioinformatics 2009, 25:1972-1973.
Basic usage
trimal -in <inputfile> -out <outputfile> -(other options).
Common options (for a complete list please see the User Guide or visit http://trimal.cgenomics.org):
-h Print this information and show some examples.
--version Print the trimAl version.
-in <inputfile> Input file in several formats (clustal, fasta, NBRF/PIR, nexus, phylip3.2, phylip).
-compareset <inputfile> Input list of paths for the files containing the alignments to compare.
-forceselect <inputfile> Force selection of the given input file in the files comparison method.
-backtrans <inputfile> Use a Coding Sequences file to get a backtranslation for a given AA alignment
-ignorestopcodon Ignore stop codons in the input coding sequences
-splitbystopcodon Split input coding sequences up to first stop codon appearance
-matrix <inpufile> Input file for user-defined similarity matrix (default is Blosum62).
--alternative_matrix <name> Select an alternative similarity matrix already loaded.
Only available 'degenerated_nt_identity'
-out <outputfile> Output alignment in the same input format (default stdout). (default input format)
-htmlout <outputfile> Get a summary of trimal's work in an HTML file.
-keepheader Keep original sequence header including non-alphanumeric characters.
Only available for input FASTA format files. (future versions will extend this feature)
-nbrf Output file in NBRF/PIR format
-mega Output file in MEGA format
-nexus Output file in NEXUS format
-clustal Output file in CLUSTAL format
-fasta Output file in FASTA format
-fasta_m10 Output file in FASTA format. Sequences name length up to 10 characters.
-phylip Output file in PHYLIP/PHYLIP4 format
-phylip_m10 Output file in PHYLIP/PHYLIP4 format. Sequences name length up to 10 characters.
-phylip_paml Output file in PHYLIP format compatible with PAML
-phylip_paml_m10 Output file in PHYLIP format compatible with PAML. Sequences name length up to 10 characters.
-phylip3.2 Output file in PHYLIP3.2 format
-phylip3.2_m10 Output file in PHYLIP3.2 format. Sequences name length up to 10 characters.
-complementary Get the complementary alignment.
-colnumbering Get the relationship between the columns in the old and new alignment.
-selectcols { n,l,m-k } Selection of columns to be removed from the alignment. Range: [0 - (Number of Columns - 1)]. (see User Guide).
-selectseqs { n,l,m-k } Selection of sequences to be removed from the alignment. Range: [0 - (Number of Sequences - 1)]. (see User Guide).
-gt -gapthreshold <n> 1 - (fraction of sequences with a gap allowed). Range: [0 - 1]
-st -simthreshold <n> Minimum average similarity allowed. Range: [0 - 1]
-ct -conthreshold <n> Minimum consistency value allowed.Range: [0 - 1]
-cons <n> Minimum percentage of the positions in the original alignment to conserve. Range: [0 - 100]
-nogaps Remove all positions with gaps in the alignment.
-noallgaps Remove columns composed only by gaps.
-keepseqs Keep sequences even if they are composed only by gaps.
-gappyout Use automated selection on "gappyout" mode. This method only uses information based on gaps' distribution. (see User Guide).
-strict Use automated selection on "strict" mode. (see User Guide).
-strictplus Use automated selection on "strictplus" mode. (see User Guide).
(Optimized for Neighbour Joining phylogenetic tree reconstruction).
-automated1 Use a heuristic selection of the automatic method based on similarity statistics. (see User Guide). (Optimized for Maximum Likelihood phylogenetic tree reconstruction).
-terminalonly Only columns out of internal boundaries (first and last column without gaps) are
candidates to be trimmed depending on the selected method
--set_boundaries { l,r } Set manually left (l) and right (r) boundaries - only columns out of these boundaries are
candidates to be trimmed depending on the selected method. Range: [0 - (Number of Columns - 1)]
-block <n> Minimum column block size to be kept in the trimmed alignment. Available with manual and automatic (gappyout) methods
-resoverlap Minimum overlap of a positions with other positions in the column to be considered a "good position". Range: [0 - 1]. (see User Guide).
-seqoverlap Minimum percentage of "good positions" that a sequence must have in order to be conserved. Range: [0 - 100](see User Guide).
-clusters <n> Get the most Nth representatives sequences from a given alignment. Range: [1 - (Number of sequences)]
-maxidentity <n> Get the representatives sequences for a given identity threshold. Range: [0 - 1].
-w <n> (half) Window size, score of position i is the average of the window (i - n) to (i + n).
-gw <n> (half) Window size only applies to statistics/methods based on Gaps.
-sw <n> (half) Window size only applies to statistics/methods based on Similarity.
-cw <n> (half) Window size only applies to statistics/methods based on Consistency.
-sgc Print gap scores for each column in the input alignment.
-sgt Print accumulated gap scores for the input alignment.
-ssc Print similarity scores for each column in the input alignment.
-sst Print accumulated similarity scores for the input alignment.
-sfc Print sum-of-pairs scores for each column from the selected alignment
-sft Print accumulated sum-of-pairs scores for the selected alignment
-sident Print identity scores matrix for all sequences in the input alignment. (see User Guide).
-soverlap Print overlap scores matrix for all sequences in the input alignment. (see User Guide).
実行方法
入力はマルチプルアライメントの出力ファイルとなる。
10%以上の配列でアライメントにギャップがある領域を全てトリミングして出力する(トリミング後の長さが60%以下になる場合、60%までトリミングを行う)。
trimal -in input.aln -out output.aln -htmlout output.html -gt 0.9 -cons 60
- -in Input file in several formats (clustal, fasta, NBRF/PIR, nexus, phylip3.2, phylip).
- -out Output alignment in the same input format (default stdout). (default input format)
- -htmlout Get a summary of trimal's work in an HTML file.
- -gt 1 - (fraction of sequences with a gap allowed).
- -cons Minimum percentage of the positions in the original alignment to conserve.
ギャップの閾値を自動で決める。4つの方法がある。
trimal -in input.aln -out output.aln -gappyout
- -gappyout Use automated selection on "gappyout" mode. This method only uses information based on gaps' distribution. (see User Guide).
trimal -in input.aln -out output.aln -strict
- -strict Use automated selection on "strict" mode. (see User Guide).
trimal -in input.aln -out output.aln -strictplus
- -strictplus Use automated selection on "strictplus" mode. (see User Guide). (Optimized for Neighbour Joining phylogenetic tree reconstruction).
trimal -in input.aln -out output.aln -automated1
- -automated1 Use a heuristic selection of the automatic method based on similarity statistics. (see User Guide). (Optimized for Maximum Likelihood phylogenetic tree reconstruction).
引用
trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses
Salvador Capella-Gutiérrez, José M. Silla-Martínez and Toni Gabaldón∗
Bioinformatics. 2009 Aug 1;25(15):1972-3.
関連