HTS (NGS) 関連のインフォマティクス情報についてまとめています。

パンゲノム解析ツール PanACoTAのallコマンドを使う




conda activate panacota

> PanACoTA all -h

# PanACoTA all -h

usage: PanACoTA all [-c CONFIGFILE] -o OUTDIR [--threads THREADS] [-T NCBI_SPECIES_TAXID] [-s NCBI_SPECIES] [-l LEVELS] [--cutn CUTN] [--l90 L90] [--nbcont NBCONT] [--prodigal] -n NAME [-i MIN_ID]

                    [--tol TOL] [-Mu] [-X] [--soft {fasttree,fastme,quicktree,iqtree,iqtree2}] [-v] [-q] [-h]


 ___                 _____  ___         _____  _____

(  _`\              (  _  )(  _`\      (_   _)(  _  )

| |_) )  _ _   ___  | (_) || ( (_)   _   | |  | (_) |

| ,__/'/'_` )/' _ `\|  _  || |  _  /'_`\ | |  |  _  |

| |   ( (_| || ( ) || | | || (_( )( (_) )| |  | | | |

(_)   `\__,_)(_) (_)(_) (_)(____/'`\___/'(_)  (_) (_)


       Large scale comparative genomics tools




=> Run all PanACoTA modules


General arguments:

  -c CONFIGFILE         Path to your configuration file, defining values of parameters.

  -o OUTDIR             Path to your output folder, where all results from all 6 modules will be saved.

  --threads THREADS     Specify how many threads can be used (default=1)


'prepare' module arguments:


                        Species taxid to download, corresponding to the 'species taxid' provided by the NCBI. A comma-separated list of taxid can also be provided.

  -s NCBI_SPECIES       Species to download, corresponding to the 'organism name' provided by the NCBI. Give name between quotes (for example "escherichia coli")

  -l LEVELS, --assembly_level LEVELS

                        Assembly levels of genomes to download (default: all). Possible levels are: 'all', 'complete', 'chromosome', 'scaffold', 'contig'.You can also provide a comma-separated

                        list of assembly levels. For ex: 'complete,chromosome'


Common arguments to 'prepare' and 'annotate' modules:

  --cutn CUTN           By default, each genome will be cut into new contigs when at least 5 'N' in a row are found in its sequence. If you don't want to cut genomes into new contigs when there

                        are rows of 'N', put 0 to this option. If you want to cut from a different number of 'N' in a row, put this value to this option.

  --l90 L90             Maximum value of L90 allowed to keep a genome. Default is 100.

  --nbcont NBCONT       Maximum number of contigs allowed to keep a genome. Default is 999.


'annotate' module arguments:

  --prodigal            Add this option if you only want syntactical annotation, given by prodigal, and not functional annotation requiring prokka and is slower.

  -n NAME               Choose a name for your annotated genomes. This name should contain 4 alphanumeric characters. Generally, they correspond to the 2 first letters of genus, and 2 first

                        letters of species, e.g. ESCO for Escherichia Coli.


'pangenome' module arguments:

  -i MIN_ID             Minimum sequence identity to be considered in the same cluster (float between 0 and 1). Default is 0.8.


'corepers' module arguments:

  --tol TOL             min % of genomes having at least 1 member in a family to consider the family as persistent (between 0 and 1, default is 1 = 100% of genomes = Core genome).By default, the

                        minimum number of genomes will be ceil('tol'*N) (N being the total number of genomes). If you want to use floor('tol'*N) instead, add the '-F' option.

  -Mu                   Add this option if you allow several members in any genome of a family. By default, only 1 (or 0 if tol<1) member per genome are allowed in all genomes. If you want to

                        allow exactly 1 member in 'tol'% of the genomes, and 0, 1 or several members in the '1-tol'% left, use the option -X instead of this one: -M and -X options are not


  -X                    Add this option if you want to allow families having several members only in '1-tol'% of the genomes. In the other genomes, only 1 member exactly is allowed. This option is

                        not compatible with -M (which is allowing multigenic families: having several members in any number of genomes).


'tree' module arguments:

  --soft {fasttree,fastme,quicktree,iqtree,iqtree2}

                        Choose with which software you want to infer the phylogenetic tree. Default is IQtree.



  -v, --verbose         Increase verbosity in stdout/stderr.

  -q, --quiet           Do not display anything to stdout/stderr. log files will still be created.

  -h, --help            show this help message and exit


For more details, see PanACoTA documentation.





git clone

> ls -l PanACoTA/Examples/input_files/










PanACoTA all -c configfile.ini -o outdir -n test







PanACoTA: a modular tool for massive microbial comparative genomics

Amandine Perrin, Eduardo P.C. Rocha

NAR Genom Bioinform. 2021 Mar; 3(1): lqaa106.