macでインフォマティクス

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

メタゲノム配列の関心がある配列を拡張アセンブリする TriMetAss

 

HPより

 TriMetAssはTrinityソフトウェアを拡張したもので、メタゲノムデータ中の興味深い特徴を囲む領域を選択してアセンブルできる。このソフトウエアは、研究対象の微生物群集において複数の文脈で出現する可能性のある、非常に一般的で保存状態の良い遺伝子(および理論的には非コード領域)に対して特に有用である。VmatchとTrinityを繰り返し呼び出すことで、シードリード(またはコンティグ)をより長いコンティグに拡張する。現在のところ、TriMetAssには十分なドキュメントが存在しないが、README.txtファイルや"-h "オプションは容易されている。(理解できない場合は著者らに直接コンタクトするよう書かれている)。TriMetAssを使用するには、VmatchとTrinityのソフトウェアパッケージをダウンロードしてインストールする。

 

HP

https://microbiology.se/2014/11/12/trimetass-a-trinity-based-targeted-metagenomics-assembler/

 

インストール

依存

#1 ここでは以前作ったtrinityの仮想環境を使う
conda activate triniy

#2 Vmatch(conda)を導入
mamba install -c bioconda -y vmatch

続いて本体を導入

HP(途中にダウンロードリンクがある)

https://microbiology.se/software/trimetass/

cd TriMetAss_1.3/
chmod 755 trimetass
chmod 755 fetch_pairs
=> trimetassとfetch_pairsをパスの通ったディレクトリにコピーするかここにパスを通す

> trimetass -h

TriMetAss - Trinity-based Metagenomics Assembler

by Johan Bengtsson-Palme, University of Gothenburg

Version: 1.3 beta

-----------------------------------------------------------------

Usage: trimetass -s <seed sequence file> -1 <input file 1> -2 <input file 2> -o <output files base>

Options:

-1 <input file> : the path to the input file containing the first reads to assemble

-2 <input file> : the path to the input file containing the second reads to assemble

-i <input file> : the path to the input file containing non-paired sequences to assemble

-s <input file> : the path to a file containing non-paired seed sequences to start assembly from

-o <output file> : the base name of the assembly output file(s)

 

--cpu <integer> : number of CPUs to use for parallelizable tasks, default = 1

-e <e-value> : sets the overlap using a cutoff for expected number of randomly assembled overlaps, default = 1e-20

--overlap <integer> : sets the minimum overlap length to be aligned manually (not using e-value)

--contig_min <integer> : sets the minimum length to output a contig, default = 300

--iterations <integer> : sets the maximum number of iterations before finishing assembly regardless of other stop criteria, default = 100;

--stop_total <integer> : sets the minimum number of nucleotides that must be gained to continue assembly, can be negative to allow loss, default = -1000;

--stop_mean <integer> : sets the minimum number of nucleotides that the mean contig must gain to continue assembly, can be negative to allow loss, default = -1000000;

--stop_max <integer> : sets the minimum number of nucleotides that the longest contig must gain to continue assembly, can be negative to allow loss, default = -1000000;

--stop_reads <integer> : sets the minimum number of candidate reads that must be gained to continue assembly, can be negative to allow loss, default = 0;

--stop_contigs <integer> : sets the minimum number of assembled contigs that must be gained to continue assembly, can be negative to allow loss, default = -1000000;

--ignore_candidates : will not stop TriMetAss if the candidates are the same as in the last iteration, default is to stop;

--auto : figure out which reads that are paired based on read IDs instead of based on which file they belonged to

         using the --auto option allows non-paired reads to be included among the the reads in the paired files.

 

-m <integer> : maximum number of reads used in Trinity per graph (not used in newer versions of Trinity), default = 1000000

-p <integer> : maximum number of outgoing paths per node used in Trinity (not used in newer versions of Trinity), default = 100

--insert_min <integer> : sets the shortest possible insert length between a pair of reads.

                         A value of 0 will also allow reads to be overlapping, default = 0

--insert_max <integer> : sets the longest possible insert length between a pair of reads.

                         A value of 0 will remove the maximum limit, default = 1000

--no_triplet_lock : turns off triplet lock in Trinity, on by default

-v : verbose output, default is progress bar

--save_raw : if specified, TriMetAss will not remove any intermediate data generated

--rerun <integer>: if specified, TriMetAss will use the sequence data in the output directory (kept using the --save_raw option) to speed up the process.

                   The integer will be interpreted as the starting round, 0 will rerun all except the sequence loading

 

-h : displays short usage information

--help : displays this help message

--bugs : displays the bug fixes and known bugs in this version

--license : displays licensing information

-----------------------------------------------------------------

 

 

実行方法

シード配列(コンティグ、遺伝子など)、ペアエンドfastaを指定する。

trimetass -s seed_seq.fna -1 R1.fa -2 R2.fa -o outdir
  • -1     the path to the input file containing the first reads to assemble
  • -2     the path to the input file containing the second reads to assemble
  • -i      the path to the input file containing non-paired sequences to assemble
  • -s     the path to a file containing non-paired seed sequences to start assembly from
  • -o    the base name of the assembly output file(s)
  • --cpu    number of CPUs to use for parallelizable tasks, default = 1
  • --save_raw    if specified, TriMetAss will not remove any intermediate data generated

 

引用

https://microbiology.se/software/trimetass/

 

参考にした論文

Metagenomic assemblies tend to break around antibiotic resistance genes
 Anna Abramova,  Antti Karkman,  Johan Bengtsson-Palme
doi: https://doi.org/10.1101/2023.12.13.571436

https://www.biorxiv.org/content/10.1101/2023.12.13.571436v1.full

 

 

関連