macでインフォマティクス

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

系統樹ファイルをチップ名(leaf)でフィルタリングする filter_tree.py スクリプト

8/8 誤字修正

 

QIIME1のfilter_tree.pyスクリプト(QIIME2ではqiime phylogeny filter-tree)は、系統樹ファイルから入力されたリスト(OTU名、ゲノム名など)で見つかったツリーのチップだけを保持するサブツリーを出力する。-negateオプションのTRUEフラグを立てると、見つからなかったサブツリーを返す。

 

QIIME1

filter_tree.py – This script prunes a tree based on a set of tip names — Homepage

QIIME2

https://docs.qiime2.org/2022.2/plugins/available/phylogeny/filter-tree/?highlight=filter_tree%20py

 

インストール

依存関係が多いので、公開されているQIIME1のdocker image(非公式)を使用した。

QIIME2

QIIME1

#dockerhub, github
docker pull mbari/qiime1:latest

filter_tree.py -h

# filter_tree.py -h

Usage: filter_tree.py [options] {-i/--input_tree_filepath

INPUT_TREE_FP -o/--output_tree_filepath OUTPUT_TREE_FP}

 

[] indicates optional input (order unimportant)

{} indicates required input (order unimportant)

 

This script takes a tree and a list of OTU IDs (in one of several

supported formats) and outputs a subtree retaining only the tips on

the tree which are found in the inputted list of OTUs (or not found,

if the --negate option is provided).

 

Example usage:

Print help message and exit

 filter_tree.py -h

 

Prune a tree to include only the tips in tips_to_keep.txt:

 filter_tree.py -i rep_seqs.tre -t tips_to_keep.txt -o pruned.tre

 

Prune a tree to remove the tips in tips_to_remove.txt. Note that the

-n/--negate option must be passed for this functionality:

 filter_tree.py -i rep_seqs.tre -t tips_to_keep.txt -o negated.tre -n

 

Prune a tree to include only the tips found in the fasta file provided:

 filter_tree.py -i rep_seqs.tre -f fast_f.fna -o pruned_fast.tre

 

Options:

  --version             show program's version number and exit

  -h, --help            show this help message and exit

  -v, --verbose         Print information during execution -- useful for

                        debugging [default: False]

  -n, --negate          if negate is True will remove input tips/seqs, if

                        negate is False, will retain input tips/seqs [default:

                        False]

  -t TIPS_FP, --tips_fp=TIPS_FP

                        A list of tips (one tip per line) or sequence

                        identifiers   (tab-delimited lines with a seq

                        identifier in the first field)   which should be

                        retained   [default: none]

  -f FASTA_FP, --fasta_fp=FASTA_FP

                        A fasta file where the seq ids should be retained

                        [default: none]

 

  REQUIRED options:

    The following options must be provided under all circumstances.

 

    -i INPUT_TREE_FP, --input_tree_filepath=INPUT_TREE_FP

                        input tree filepath [REQUIRED]

    -o OUTPUT_TREE_FP, --output_tree_filepath=OUTPUT_TREE_FP

                        output tree filepath [REQUIRED]

 

 

実行方法

1、ここではdockerイメージを立ち上げて環境内で作業する。

cd <path>/<to>/<tree_dir>/
docker run -itv $PWD:/data -w /data --rm mbari/qiime1:latest
source activate qiime1

 

2、保持するOTU名やゲノム名を記入したリスト(1行に1つずつ)と、フィルタリングするツリーファイル名、出力ツリーファイル名を指定する。”-n”をつけるとリストに含まれないツリーが出力される。

filter_tree.py -i input.tre -t tips_keep.txt -o output.tre
  • -t    A list of tips (one tip per line) or sequence identifiers   (tab-delimited lines with a seq identifier in the first field)   which should be retained   [default: none]
  •  -i     input tree filepath [REQUIRED]
  • -o    output tree filepath [REQUIRED] 
  •  -n  if negate is True will remove input tips/seqs, if negate is False, will retain input tips/seqs 

 

引用

Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2

Evan Bolyen, Jai Ram Rideout, Matthew R. Dillon, Nicholas A. Bokulich, Christian C. Abnet, Gabriel A. Al-Ghalith, Harriet Alexander, Eric J. Alm, Manimozhiyan Arumugam, Francesco Asnicar, Yang Bai, Jordan E. Bisanz, Kyle Bittinger, Asker Brejnrod, Colin J. Brislawn, C. Titus Brown, Benjamin J. Callahan, Andrés Mauricio Caraballo-Rodríguez, John Chase, Emily K. Cope, Ricardo Da Silva, Christian Diener, Pieter C. Dorrestein, Gavin M. Douglas, Daniel M. Durall, Claire Duvallet, Christian F. Edwardson, Madeleine Ernst, Mehrbod Estaki, Jennifer Fouquier, Julia M. Gauglitz, Sean M. Gibbons, Deanna L. Gibson, Antonio Gonzalez, Kestrel Gorlick, Jiarong Guo, Benjamin Hillmann, Susan Holmes, Hannes Holste, Curtis Huttenhower, Gavin A. Huttley, Stefan Janssen, Alan K. Jarmusch, Lingjing Jiang, Benjamin D. Kaehler, Kyo Bin Kang, Christopher R. Keefe, Paul Keim, Scott T. Kelley, Dan Knights, Irina Koester, Tomasz Kosciolek, Jorden Kreps, Morgan G. I. Langille, Joslynn Lee, Ruth Ley, Yong-Xin Liu, Erikka Loftfield, Catherine Lozupone, Massoud Maher, Clarisse Marotz, Bryan D. Martin, Daniel McDonald, Lauren J. McIver, Alexey V. Melnik, Jessica L. Metcalf, Sydney C. Morgan, Jamie T. Morton, Ahmad Turan Naimey, Jose A. Navas-Molina, Louis Felix Nothias, Stephanie B. Orchanian, Talima Pearson, Samuel L. Peoples, Daniel Petras, Mary Lai Preuss, Elmar Pruesse, Lasse Buur Rasmussen, Adam Rivers, Michael S. Robeson II, Patrick Rosenthal, Nicola Segata, Michael Shaffer, Arron Shiffer, Rashmi Sinha, Se Jin Song, John R. Spear, Austin D. Swafford, Luke R. Thompson, Pedro J. Torres, Pauline Trinh, Anupriya Tripathi, Peter J. Turnbaugh, Sabah Ul-Hasan, Justin J. J. van der Hooft, Fernando Vargas, Yoshiki Vázquez-Baeza, Emily Vogtmann, Max von Hippel, William Walters, Yunhu Wan, Mingxun Wang, Jonathan Warren, Kyle C. Weber, Charles H. D. Williamson, Amy D. Willis, Zhenjiang Zech Xu, Jesse R. Zaneveld, Yilong Zhang, Qiyun Zhu, Rob Knight & J. Gregory Caporaso
Nature Biotechnology volume 37, pages 852–857 (2019)