複雑なメタゲノムデータセットから高品質なゲノムを回収するビニングアルゴリズム binny

2022/10/15 論文引用

　ゲノムの再構築は、ゲノム-resolved メタゲノム解析や微生物群集からのマルチオミクスデータ統合において重要なステップである。本発表では、連続したゲノムと高度に断片化したゲノムの両方から高品質のメタゲノムアセンブリゲノムを生成するビニングツール、binnyを紹介する。また、他の手法では検出できないユニークなゲノムを検出することができる。

binnyは、メタゲノム・リードによるk-mer合成とカバレッジを利用して、ゲノム・シグネチャーの非線形次元削減を繰り返し、それに続く系統特異的マーカー遺伝子セットを用いたクラスタ評価による自動コンティグ・クラスタリングを行うことができる。

binnyを広く用いられている5つのビニングアルゴリズムと比較した結果、Critical Assessment of Metagenome Interpretation (CAMI) Initiativeによるシミュレーションデータセットや、様々な環境のメタゲノムからなる実環境ベンチマークから、最も完全なゲノムに近い（95%以上、90%以上）、高品質の（90%以上、70%以上）ドメインを復元した。binnyはSnakemakeワークフローとして実装されていて、https://github.com/a-h-b/binny から入手できる。

インストール

mambaで環境を作ってテストした（ubuntu18使用）。

Github

git clone https://github.com/a-h-b/binny.git
cd binny
cp runscripts/binny_submit.sh binny
chmod 755 binny
mkdir -p conda
mamba create --prefix $PWD/conda/snakemake_env snakemake=6.9.1 mamba unzip -c conda-forge -c bioconda
conda activate $PWD/conda/snakemake_env
./binny -i config/config.init.yaml

#データベースは5GBくらいの容量がある。

f:id:kazumaxneo:20220220122033p:plain

> ./binny -h

Usage: ./binny [-u|d|c|l|i] [-b node] [-t number] [-r] [-n name] /absolute_path/to/config_file

-n <name for main job>, only works with -c and -f

-r if set, a report is generated (it's recommended to run -c and -l with -r)

-d if set, a dryrun is performed

-c if set, the whole thing is submitted to the cluster

-b if -c is set, -b gives the node name to submit the main instance to

-i if set, only the conda environments will be installed, if they don't exist

-u if set, the working directory will be unlocked (only necessary for crash/kill recovery)

-l if set, the main snakemake thread and indivdual rules are run in the current terminal session

-t <max_threads> maximum number of cpus to use for all rules at a time. Defaults to 50 for -c, and to 1 for -l. No effect on -r, -d or -u only.

テストラン

yamlファイルを指定する。

./binny -l -n "TESTRUN" -r config/config.test.yaml

config/config.test.yaml

f:id:kazumaxneo:20220219234000p:plain

出力

test_output/

f:id:kazumaxneo:20220219233731p:plain

test_output/bins/

f:id:kazumaxneo:20220219233751p:plain

論文見ましたが、最新のbinnerと比較しても性能高いようですね。自分のデータセットでも利用してみようと思います。

引用

binny: an automated binning algorithm to recover high-quality genomes from complex metagenomic datasets
Oskar Hickl, Pedro Queirós, Paul Wilmes, Patrick May, Anna Heintz-Buschart

bioRxiv, Posted December 23, 2021

2022/10/15

binny: an automated binning algorithm to recover high-quality genomes from complex metagenomic datasets
Oskar Hickl, Pedro Queirós, Paul Wilmes, Patrick May, Anna Heintz-Buschart
Briefings in Bioinformatics, Published: 13 October 2022