HTS (NGS) 関連のインフォマティクス情報についてまとめています。

EST配列をゲノムにアラインメントするEMBOSSの est2genome






emboss explorer





conda install -c bioconda -y emboss

brew install emboss

est2genome -h

$ est2genome -h

Align EST sequences to genomic DNA sequence

Version: EMBOSS:


   Standard (Mandatory) qualifiers:

  [-estsequence]       seqall     Spliced EST nucleotide sequence(s)

  [-genomesequence]    sequence   Unspliced genomic nucleotide sequence

  [-outfile]           outfile    [*.est2genome] Output file name


   Additional (Optional) qualifiers:

   -match              integer    [1] Score for matching two bases (Any

                                  integer value)

   -mismatch           integer    [1] Cost for mismatching two bases (Any

                                  integer value)

   -gappenalty         integer    [2] Cost for deleting a single base in

                                  either sequence, excluding introns (Any

                                  integer value)

   -intronpenalty      integer    [40] Cost for an intron, independent of

                                  length. (Any integer value)

   -splicepenalty      integer    [20] Cost for an intron, independent of

                                  length and starting/ending on donor-acceptor

                                  sites (Any integer value)

   -minscore           integer    [30] Exclude alignments with scores below

                                  this threshold score. (Any integer value)


   Advanced (Unprompted) qualifiers:

   -reverse            boolean    Reverse the orientation of the EST sequence

   -[no]usesplice      boolean    [Y] Use donor and acceptor splice sites. If

                                  you want to ignore donor-acceptor sites then

                                  set this to be false.

   -mode               menu       [both] This determines the comparison mode.

                                  The default value is 'both', in which case

                                  both strands of the est are compared

                                  assuming a forward gene direction (ie GT/AG

                                  splice sites), and the best comparison

                                  redone assuming a reversed (CT/AC) gene

                                  splicing direction. The other allowed modes

                                  are 'forward', when just the forward strand

                                  is searched, and 'reverse', ditto for the

                                  reverse strand. (Values: both (Both

                                  strands); forward (Forward strand only);

                                  reverse (Reverse strand only))

   -[no]best           boolean    [Y] You can print out all comparisons

                                  instead of just the best one by setting this

                                  to be false.

   -space              float      [10.0] For linear-space recursion. If

                                  product of sequence lengths divided by 4

                                  exceeds this then a divide-and-conquer

                                  strategy is used to control the memory

                                  requirements. In this way very long

                                  sequences can be aligned.

                                  If you have a machine with plenty of memory

                                  you can raise this parameter (but do not

                                  exceed the machine's physical RAM) (Any

                                  numeric value)

   -shuffle            integer    [0] Shuffle (Any integer value)

   -seed               integer    [20825] Random number seed (Any integer


   -align              boolean    Show the alignment. The alignment includes

                                  the first and last 5 bases of each intron,

                                  together with the intron width. The

                                  direction of splicing is indicated by angle

                                  brackets (forward or reverse) or ????


   -width              integer    [50] Alignment width (Any integer value)


   General qualifiers:

   -help               boolean    Report command line options and exit. More

                                  information on associated and general

                                  qualifiers can be found with -help -verbose






est2genome est.fa genome.fa output.aln 
  • -[no] best boolean [Y] You can print out all comparisons
    instead of just the best one by setting this
    to be false.






EMBOSS: the European Molecular Biology Open Software Suite.
Rice P, Longden I, Bleasby A

Trends Genet. 2000 Jun;16(6):276-7