gotranseqは核酸配列を対応するペプチド配列に翻訳する。EMBOSS transeqと似ているが、goで書かれている。
EMBOSS transeq は素晴らしいツールだが、':' のような文字が含まれている場合はシーケンス ID を黙って切り捨てたり、'|' のような文字が含まれている場合はシーケンス ID の名前を変更したりするので、使用例によっては非常に痛い目にあうことがある。このツールはその問題を解決するための試みであり、並列化されておりEMBOSS transeqよりもはるかに速い。
インストール
リリースよりOSX向けのバイナリをダウンロードした。
依存
git clone https://github.com/feliixx/gotranseq.git
cd gotranseq
go install
> gotranseq -h
$ gotranseq -h
gotranseq version 0.2.1
Usage:
gotranseq
required:
-s, --sequence=<filename> Nucleotide sequence(s) filename
-o, --outseq=<filename> Protein sequence filename
optional:
-f, --frame=<code> Frame to translate. Possible values:
[1, 2, 3, F, -1, -2, -3, R, 6]
F: forward three frames
R: reverse three frames
6: all 6 frames
(default: 1)
-t, --table=<code> NCBI code to use, see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1 for details. Available codes:
0: Standard code
2: The Vertebrate Mitochondrial Code
3: The Yeast Mitochondrial Code
4: The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code
5: The Invertebrate Mitochondrial Code
6: The Ciliate, Dasycladacean and Hexamita Nuclear Code
9: The Echinoderm and Flatworm Mitochondrial Code
10: The Euplotid Nuclear Code
11: The Bacterial, Archaeal and Plant Plastid Code
12: The Alternative Yeast Nuclear Code
13: The Ascidian Mitochondrial Code
14: The Alternative Flatworm Mitochondrial Code
16: Chlorophycean Mitochondrial Code
21: Trematode Mitochondrial Code
22: Scenedesmus obliquus Mitochondrial Code
23: Thraustochytrium Mitochondrial Code
24: Pterobranchia Mitochondrial Code
25: Candidate Division SR1 and Gracilibacteria Code
26: Pachysolen tannophilus Nuclear Code
29: Mesodinium Nuclear
30: Peritrich Nuclear
(default: 0)
-c, --clean Replace stop codon '*' by 'X'
-a, --alternative Define frame '-1' as using the set of codons starting with the last codon of the sequence
-T, --trim Removes all 'X' and '*' characters from the right end of the translation. The trimming process starts at the end and continues until
the next character is not a 'X' or a '*'
-n, --numcpu=<n> Number of worker to use (default: number of CPU)
general:
-h, --help Show this help message
-v, --version Print the tool version and exit
実行方法
翻訳したい遺伝子配列のファイル(multi-fasta)を指定する。
gotranseq --sequence input.fna --outseq output.faa --frame 6 -t 0 -n 2
- -o, --outseq=<filename> Protein sequence filename
- -n, --numcpu=<n> Number of worker to use (default: number of CPU)
- -f, --frame=<code> Frame to translate. Possible values:
[1, 2, 3, F, -1, -2, -3, R, 6]
F: forward three frames
R: reverse three frames
6: all 6 frames
(default: 1) - -t, --table=<code> NCBI code to use, see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1 for details. Available codes:
0: Standard code
2: The Vertebrate Mitochondrial Code
3: The Yeast Mitochondrial Code
4: The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code
5: The Invertebrate Mitochondrial Code
6: The Ciliate, Dasycladacean and Hexamita Nuclear Code
9: The Echinoderm and Flatworm Mitochondrial Code
10: The Euplotid Nuclear Code
11: The Bacterial, Archaeal and Plant Plastid Code
12: The Alternative Yeast Nuclear Code
13: The Ascidian Mitochondrial Code
14: The Alternative Flatworm Mitochondrial Code
16: Chlorophycean Mitochondrial Code
21: Trematode Mitochondrial Code
22: Scenedesmus obliquus Mitochondrial Code
23: Thraustochytrium Mitochondrial Code
24: Pterobranchia Mitochondrial Code
25: Candidate Division SR1 and Gracilibacteria Code
26: Pachysolen tannophilus Nuclear Code
29: Mesodinium Nuclear
30: Peritrich Nuclear
(default: 0)
引用
GitHub - feliixx/gotranseq: convert nucleic sequence in protein sequence