バクテリオファージの標準的なアノテーションを行う pharokka

2024/01/12 論文引用

Gitrhubより

pharokkaはバクテリオファージの標準的なアノテーションを迅速に行うために設計されています。簡単に説明すると、遺伝子予測はPHANOTATE (https://github.com/deprekate/PHANOTATE) を、機能アノテーションはPHROGsデータベース (https://phrogs.lmge.uca.fr) をもとに行う。主な出力はgffファイルで、Roary (https://sanger-pathogens.github.io/Roary/)などのパンゲノムパイプラインの下流で使用するのに適している。もう一つの重要な出力はcds_functions.tsvで、CDS、tRNA、およびPHROGsデータベースに従ってCDSに割り当てられた機能のカウントが含まれています。

I have created pharokka, a dedicated fast phage annotation tool inspired by prokka. Pharokka is designed for users who want one-line, rapid, standarised and scalable phage annotations, much like prokka for prokaryotes. https://t.co/xIN2VClUeb
— George Bouras (@GB13Faithless) June 29, 2022

インストール

提供されているYAMLファイルで環境を作ってテストした。

pharokka has been tested on Linux and MacOS (M1 and Intel).

Github

#conda (pythyon3.8環境で導入テスト済み)
mamba create -n pharokka -y
conda activate pharokka
mamba install pharokka -c gbouras13 -c bioconda -y

git clone https://github.com/gbouras13/pharokka.git
cd pharokka
mamba env create -f environment.yml
conda activate pharokka_env

> python bin/pharokka.py -h

usage: pharokka.py [-h] -i INFILE [-o OUTDIR] [-d DATABASE] [-t THREADS] [-f] [-p PREFIX] [-V]

pharokka: phage genome annotation piepline

optional arguments:

-h, --help show this help message and exit

-i INFILE, --infile INFILE

input file in fasta format

-o OUTDIR, --outdir OUTDIR

where to write the output

-d DATABASE, --database DATABASE

database directory. If the databases have been install in the default directory, this is not required. Otherwise specify the path

-t THREADS, --threads THREADS

Number of threads for mmseqs and hhsuite. Defaults to 1.

-f, --force Overwrites the output directory

-p PREFIX, --prefix PREFIX

Prefix for output files. This is not required

-V, --version show program's version number and exit

データベースの準備

PHROGsのデータベースをインストールする。

python bin/install_databases.py -d Y

-d Must be "Y" or "N". Determines whether you want databases stored in the default location, or in a custom directory
-o Database Directory - will be created and must be specificed in conjunction with -d N

databasess/

実行方法

fastaファイルを指定する。出力を指定しない場合はoutput/に書き込まれる。8スレッド指定。

pharokka.py -i input.fasta -o outdir -t 8

-i nput file in fasta format
-o where to write the output
-d database directory. If the databases have been install in the default directory, this is not required. Otherwise specify the path
-t Number of threads for mmseqs and hhsuite. Defaults to 1.
-p Prefix for output files. This is not required

16GBの標準的なノートパソコンで8スレッド指定した場合、ゲノムサイズにもよるが、pharokkaは5-20分かかる（Githubより）。

出力例

引用

GitHub - gbouras13/pharokka: fast phage annotation program

2024/01/12

Pharokka: a fast scalable bacteriophage annotation tool
George Bouras, Roshan Nepal, Ghais Houtak, Alkis James Psaltis, Peter-John Wormald, Sarah Vreugde
Bioinformatics, Volume 39, Issue 1, January 2023