核酸配列をアミノ酸配列に翻訳する gotranseq

　gotranseqは核酸配列を対応するペプチド配列に翻訳する。EMBOSS transeqと似ているが、goで書かれている。

EMBOSS transeq は素晴らしいツールだが、':' のような文字が含まれている場合はシーケンス ID を黙って切り捨てたり、'|' のような文字が含まれている場合はシーケンス ID の名前を変更したりするので、使用例によっては非常に痛い目にあうことがある。このツールはその問題を解決するための試みであり、並列化されておりEMBOSS transeqよりもはるかに速い。

インストール

リリースよりOSX向けのバイナリをダウンロードした。

依存

Github

git clone https://github.com/feliixx/gotranseq.git
cd gotranseq
go install

> gotranseq -h

$ gotranseq -h

gotranseq version 0.2.1

Usage:

gotranseq

required:

-s, --sequence=<filename> Nucleotide sequence(s) filename

-o, --outseq=<filename> Protein sequence filename

optional:

-f, --frame=<code> Frame to translate. Possible values:

[1, 2, 3, F, -1, -2, -3, R, 6]

F: forward three frames

R: reverse three frames

6: all 6 frames

(default: 1)

-t, --table=<code> NCBI code to use, see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1 for details. Available codes:

0: Standard code

2: The Vertebrate Mitochondrial Code

3: The Yeast Mitochondrial Code

4: The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code

5: The Invertebrate Mitochondrial Code

6: The Ciliate, Dasycladacean and Hexamita Nuclear Code

9: The Echinoderm and Flatworm Mitochondrial Code

10: The Euplotid Nuclear Code

11: The Bacterial, Archaeal and Plant Plastid Code

12: The Alternative Yeast Nuclear Code

13: The Ascidian Mitochondrial Code

14: The Alternative Flatworm Mitochondrial Code

16: Chlorophycean Mitochondrial Code

21: Trematode Mitochondrial Code

22: Scenedesmus obliquus Mitochondrial Code

23: Thraustochytrium Mitochondrial Code

24: Pterobranchia Mitochondrial Code

25: Candidate Division SR1 and Gracilibacteria Code

26: Pachysolen tannophilus Nuclear Code

29: Mesodinium Nuclear

30: Peritrich Nuclear

(default: 0)

-c, --clean Replace stop codon '*' by 'X'

-a, --alternative Define frame '-1' as using the set of codons starting with the last codon of the sequence

-T, --trim Removes all 'X' and '*' characters from the right end of the translation. The trimming process starts at the end and continues until

the next character is not a 'X' or a '*'

-n, --numcpu=<n> Number of worker to use (default: number of CPU)

general:

-h, --help Show this help message

-v, --version Print the tool version and exit

実行方法

翻訳したい遺伝子配列のファイル（multi-fasta）を指定する。

gotranseq --sequence input.fna --outseq output.faa --frame 6 -t 0 -n 2

-o, --outseq=<filename> Protein sequence filename
-n, --numcpu=<n> Number of worker to use (default: number of CPU)
-f, --frame=<code> Frame to translate. Possible values:
   [1, 2, 3, F, -1, -2, -3, R, 6]
   F: forward three frames
   R: reverse three frames
   6: all 6 frames
   (default: 1)
-t, --table=<code> NCBI code to use, see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1 for details. Available codes:
   0: Standard code
   2: The Vertebrate Mitochondrial Code
   3: The Yeast Mitochondrial Code
   4: The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code
   5: The Invertebrate Mitochondrial Code
   6: The Ciliate, Dasycladacean and Hexamita Nuclear Code
   9: The Echinoderm and Flatworm Mitochondrial Code
   10: The Euplotid Nuclear Code
   11: The Bacterial, Archaeal and Plant Plastid Code
   12: The Alternative Yeast Nuclear Code
   13: The Ascidian Mitochondrial Code
   14: The Alternative Flatworm Mitochondrial Code
   16: Chlorophycean Mitochondrial Code
   21: Trematode Mitochondrial Code
   22: Scenedesmus obliquus Mitochondrial Code
   23: Thraustochytrium Mitochondrial Code
   24: Pterobranchia Mitochondrial Code
   25: Candidate Division SR1 and Gracilibacteria Code
   26: Pachysolen tannophilus Nuclear Code
   29: Mesodinium Nuclear
   30: Peritrich Nuclear
   (default: 0)