2020 10/26 誤字修正
このプログラムは、1つ以上のヌクレオチド配列中のオープンリーディングフレーム(ORF)の配列を検索して出力する。ORFは、2つのSTOPコドンの間、またはSTARTコドンとSTOPコドンの間の指定された最小サイズの領域として定義することができる。ORFは、ヌクレオチド配列として、またはタンパク質翻訳物として出力することができる。オプションとして、プログラムは、ORFのSTARTコドンから最初のSTOPコドン、または最後のSTOPコドンまでの領域を出力する。STARTコドンとSTOPコドンはGenetic Codeテーブルで定義されている。出力されるのは、最小サイズよりも長い予測ORFを含むシーケンスファイルで、最小サイズのデフォルトは30塩基(すなわち10アミノ酸)になっている。
emboss getorf
http://emboss.sourceforge.net/apps/cvs/emboss/apps/getorf.html
インストール
condaやbrewで導入できる。
#bioconda
conda install -c bioconda -y emboss
#homebrew
brew install emboss
> getorf -h
$ getorf -h
Find and extract open reading frames (ORFs)
Version: EMBOSS:6.6.0.0
Standard (Mandatory) qualifiers:
[-sequence] seqall Nucleotide sequence(s) filename and optional
format, or reference (input USA)
[-outseq] seqoutall [<sequence>.<format>] Protein sequence
set(s) filename and optional format (output
USA)
Additional (Optional) qualifiers:
-table menu [0] Code to use (Values: 0 (Standard); 1
(Standard (with alternative initiation
codons)); 2 (Vertebrate Mitochondrial); 3
(Yeast Mitochondrial); 4 (Mold, Protozoan,
Coelenterate Mitochondrial and
Mycoplasma/Spiroplasma); 5 (Invertebrate
Mitochondrial); 6 (Ciliate Macronuclear and
Dasycladacean); 9 (Echinoderm
Mitochondrial); 10 (Euplotid Nuclear); 11
(Bacterial); 12 (Alternative Yeast Nuclear);
13 (Ascidian Mitochondrial); 14 (Flatworm
Mitochondrial); 15 (Blepharisma
Macronuclear); 16 (Chlorophycean
Mitochondrial); 21 (Trematode
Mitochondrial); 22 (Scenedesmus obliquus);
23 (Thraustochytrium Mitochondrial))
-minsize integer [30] Minimum nucleotide size of ORF to
report (Any integer value)
-maxsize integer [1000000] Maximum nucleotide size of ORF to
report (Any integer value)
-find menu [0] This is a small menu of possible output
options. The first four options are to
select either the protein translation or the
original nucleic acid sequence of the open
reading frame. There are two possible
definitions of an open reading frame: it can
either be a region that is free of STOP
codons or a region that begins with a START
codon and ends with a STOP codon. The last
three options are probably only of interest
to people who wish to investigate the
statistical properties of the regions around
potential START or STOP codons. The last
option assumes that ORF lengths are
calculated between two STOP codons. (Values:
0 (Translation of regions between STOP
codons); 1 (Translation of regions between
START and STOP codons); 2 (Nucleic sequences
between STOP codons); 3 (Nucleic sequences
between START and STOP codons); 4
(Nucleotides flanking START codons); 5
(Nucleotides flanking initial STOP codons);
6 (Nucleotides flanking ending STOP codons))
Advanced (Unprompted) qualifiers:
-[no]methionine boolean [Y] START codons at the beginning of protein
products will usually code for Methionine,
despite what the codon will code for when it
is internal to a protein. This qualifier
sets all such START codons to code for
Methionine by default.
-circular boolean [N] Is the sequence circular
-[no]reverse boolean [Y] Set this to be false if you do not wish
to find ORFs in the reverse complement of
the sequence.
-flanking integer [100] If you have chosen one of the options
of the type of sequence to find that gives
the flanking sequence around a STOP or START
codon, this allows you to set the number of
nucleotides either side of that codon to
output. If the region of flanking
nucleotides crosses the start or end of the
sequence, no output is given for this codon.
(Any integer value)
General qualifiers:
-help boolean Report command line options and exit. More
information on associated and general
qualifiers can be found with -help -verbose
実行方法
入力の配列(FASTA)、出力の順番で指定する。コマンドだけ叩くと、対話モードで実行できる。
getorf inputseq.fasta output
- -minsize Minimum nucleotide size of ORF to report [30]
- -maxsize Maximum nucleotide size of ORF to report [1000000]
EMBOSSの類似コマンドとしてtcode、sixpack、checktrans、showorf、plotorfなどがある。
引用
EMBOSS: the European Molecular Biology Open Software Suite.
Rice P, Longden I, Bleasby A
Trends Genet. 2000 Jun;16(6):276-7
関連