[antlr-interest] Using long literal definitions in ANTLR

Micheal J open.zone at virgin.net
Fri Nov 24 07:27:06 PST 2006


Hi,
 
Any reason why you're still using ANTLR 2.7.2?. I'd recommend V3 for what
you're doing (or at least ANTLR 2.7.6+ if you must use ANTLR 2.7.x).
 
You could try something like (check for typos/syntax)
 
class TestLexer extends...
 
options {...}
 
tokens {
   BEGIN,
   END,
    ....
   FRAMESTYLE,
   STYLE
}
 
FORMS_LITERALS
    : "BEGIN"              { $setType(BEGIN); } 
    | "END"                  { $setType(END); }
    .......
    | "FRAMESTYLE"   { $setType(FRAMESTYLE); }
    | "STYLE"              { $setType(STYLE); }
    ;
 
 
Micheal
-----------------------
The best way to contact me is via the list/forum. My time is very limited. 

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Andrew Monaghan
Sent: 24 November 2006 00:35
To: antlr-interest at antlr.org
Subject: [antlr-interest] Using long literal definitions in ANTLR



I'm using an older version of ANTLR 2.7.2 to build a relatively simple forms
parser, but I've come across an issue that has stumped me for a couple of
days now.  I'm no expert on parsers so I'm probably missing something very
obvious - so any help would be gratefully received.  The issue stems for a
set of long, similar literals in the form definition.

 

The grammer has a number of definitions for 'form' style including
"DOULBLE_THIN",  "DOUBLE_THICK",  "DOUBLE" and "DOUBLEUNDER".  To
accommodate the long literal length allowing ANTLR to distinguish between
the literals i've upped k to 11, but I'm still getting nondeterminism
between rules FRAMESTYLETYPE and STYLETYPE.  And when the parser is run I
get an exception in the generated  nextToken() operation in the lexer.

 

For example, the parser may be parsing a STYLE line and expects one of the
STYLETYPES to follow but due to the order of the rules in the grammer
nextToken() matches the DOUBLE (from DOUBLE_THIN) from the FRAMESTYLETYPE
first and throws an exception.  Changing the rules order isn't an option
because it will throw an exception on the FRAMESTYLE line instead.

 

I'm sure this is a problem because i've increased k but I'm at a loss at any
alternative strategies.  

 

Cheers,

 

Andy

 

---------------------------------------------------------

Below is a simplified version of the grammer...

 

options

{

language = "CSharp";

}

 

class TestParser extends Parser;

form:             (formbody)+ ;

formbody:         BEGIN formcontent END  ;

formcontent:      (formentry)+ ;

 

formentry :       styleline

                  | framestyleline;

                  

styleline:              STYLE style1:STYLETYPE;


framestyleline:         FRAMESTYLE style:FRAMESTYLETYPE;

            

      

class TestLexer extends Lexer;

options

{

      k = 11;

      charVocabulary = '\3'..'\377';

      charVocabulary = '\u0000'..'\uFFFE';

}

 

 

BEGIN:                  "BEGIN" ;

END:                    "END" ;

STYLE:                  "STYLE" ;

FRAMESTYLE:             "FRAMESTYLE" ;

FRAMESTYLETYPE:         "SINGLE_THIN" | "DOUBLE_THIN" | "SINGLE_THICK" |
"DOUBLE_THICK" | "DOTTED" ;

STYLETYPE:              "NORMAL" | "BOLD" | "ITALIC" | "UNDER" |
"DOUBLEUNDER" | "DOUBLE" | "TRIPLE" | "QUADRUPLE" |

                        "STRIKETHROUGH" | "ROTATE90" | "ROTATE270" |
"UPSIDEDOWN" | "PROPORTIONAL" | "DOUBLEHIGH" |

                        "TRIPLEHIGH" | "QUADRUPLEHIGH" | "CONDENSED" |
"SUPERSCRIPT" | "OVERSCORE" | "LETTERQUALITY" |

                        "NEARLETTERQUALITY" | "DOUBLESTRIKE" | "OPAQUE" ;

 

WS : (' ' | '\t')+  { $setType(Token.SKIP); }

            ;

NEWLINE

    :   '\r' '\n' { newline(); $setType(Token.SKIP);}

    |   '\n' { newline(); $setType(Token.SKIP);}            

    |   '\r' { newline(); $setType(Token.SKIP);}            

     ;

    

------------------------------------------------

 

            public new Token nextToken()              //throws
TokenStreamException

            {

                  ...

                                    default:

                                          if ((LA(1)=='D'||LA(1)=='S') &&
(LA(2)=='I'||LA(2)=='O') && (LA(3)=='N'||LA(3)=='T'||LA(3)=='U') &&
(LA(4)=='B'||LA(4)=='G'||LA(4)=='T') && (LA(5)=='E'||LA(5)=='L') &&
(LA(6)=='D'||LA(6)=='E') && (true) && (true) && (true) && (true) && (true))

                                          {

                                                mFRAMESTYLETYPE(true);

                                                theRetToken = returnToken_;

                                          }

                                          else if
((tokenSet_0_.member(LA(1))) && (tokenSet_1_.member(LA(2))) &&
(tokenSet_2_.member(LA(3))) && (tokenSet_3_.member(LA(4))) && (true) &&
(true) && (true) && (true) && (true) && (true) && (true)) {

                                                mSTYLETYPE(true);

                                                theRetToken = returnToken_;

                                          }

                                          else if ((LA(1)=='B') &&
(LA(2)=='E') && (LA(3)=='G')) {

                                                mBEGIN(true);

                                                theRetToken = returnToken_;

                                          }

                                          else if ((LA(1)=='S') &&
(LA(2)=='T') && (LA(3)=='Y')) {

                                                mSTYLE(true);

                                                theRetToken = returnToken_;

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061124/14275e15/attachment-0001.html 


More information about the antlr-interest mailing list