HTS (NGS) 関連のインフォマティクス情報についてまとめています。

正確な系統推定のためのアラインメントトリミングツール ClipKIT

2020 12/7  論文引用







pip install clipkit

#from repository
git clone
cd ClipKIT/
python -m venv .venv
source .venv/bin/activate
make install

clipkit -h

$ clipkit -h

      _____ _ _       _  _______ _______ 

     / ____| (_)     | |/ /_   _|__   __|

    | |    | |_ _ __ | ' /  | |    | |   

    | |    | | | '_ \|  <   | |    | |   

    | |____| | | |_) | . \ _| |_   | |   

     \_____|_|_| .__/|_|\_\_____|  |_|   

               | |                       



Citation: Steenwyk et al. bioRxiv.


ClipKIT trims multiple sequence alignments and maintains phylogenetically informative sites.


Usage: clipkit <input> [optional arguments]


required arguments:

  <input>                                     input file

                                              (must be the first argument)


optional arguments:

  -o, --output <output_file_name>             output file name 

                                              (default: input file named with '.clipkit' suffix)


  -m, --modes <gappy,                         trimming mode 

              kpic (alias: medium),           (default: gappy)

              kpic-gappy (alias: medium-gappy),                

              kpi (alias: heavy),

              kpi-gappy (alias: heavy-gappy)>                      


  -g, --gaps <threshold of gaps>              specifies gaps threshold

                                              (default: 0.9)


  -if, --input_file_format <file_format>      specifies input file format

                                              (default: auto-detect)    


  -of, --output_file_format <file_format>     specifies output file format

                                              (default: same as input file format)


  -l, --log                                   creates a log file

                                              (input file named with '.log' suffix)


  -c, --complementary                         creates complementary alignment of trimmed sequences

                                              (input file named with '.log' suffix)


  -h, --help                                  help message

  -v, --version                               print version



  | Detailed explanation of arguments | 



      gappy: trim sites that are greater than the gaps threshold

      kpic (alias: medium): keeps parismony informative and constant sites

      kpic-gappy (alias: medium-gappy): a combination of kpic- and gappy-based trimming

      kpi (alias: heavy): keep only parsimony informative sites

      kpi-gappy (alias: heavy-gappy): a combination of kpi- and gappy-based trimming



      Positions with gappyness greater than threshold will be trimmed. 

      Must be between 0 and 1. (Default: 0.9). This argument is ignored

      when using the kpi mode of trimming.


  Input and output file formats

      Supported input and output files include:

      fasta, clustal, maf, mauve, phylip, phylip-sequential, 

      phylip-relaxed, and stockholm



      Creates a log file that summarizes the characteristics of each position.

      The log file has four columns.

      - Column 1 is the position in the alignment (starting at 1), 

      - Column 2 reports if the site was trimmed or kept (trim and keep, respectively),

      - Column 3 reports if the site is a parsimony informative site or not (PI and nPI, respectively), or

        a constant site or not (Const, nConst), or neither (nConst, nPI)

      - Column 4 reports the gappyness of the the position (number of gaps / entries in alignment)



      Creates an alignment file of only the trimmed sequences



 アラインメント間でギャップが多い領域(default: 0.9)をトリミングする。

clipkit input.aln

clipkit input.aln -m gappy -o output
  • -m   trimming mode 
    kpic (alias: medium) (default: gappy)
    kpic-gappy (alias: medium-gappy),                
    kpi (alias: heavy),
    kpi-gappy (alias: heavy-gappy)>                     



clipkit input.aln -l
  • -l   creates a log file (input file named with '.log' suffix)





clipkit input.aln -l -c
  • -c   creates complementary alignment of trimmed sequences (input file named with '.log' suffix)




ClipKIT: a multiple sequence alignment-trimming algorithm for accurate phylogenomic inference

Jacob L. Steenwyk, Thomas J. Buida III, Yuanning Li, Xing-Xing Shen, Antonis Rokas

bioRxiv, Posted June 10, 2020


2020 12/7

ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference

Jacob L Steenwyk, Thomas J Buida 3rd, Yuanning Li , Xing-Xing Shen, Antonis Rokas

PLoS Biol. 2020 Dec 2;18(12)