[antlr-interest] How to swich the recognition scope in Lexer

Silvester Pozarnik silvester.pozarnik at tracetracker.com
Wed Jun 20 08:05:02 PDT 2007



> -----Original Message-----
> From: Thomas Brandon [mailto:tbrandonau at gmail.com]
> Sent: 20. juni 2007 15:52
> To: Silvester Pozarnik
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] How to swich the recognition scope in
Lexer
> 
> On 6/20/07, Silvester Pozarnik <silvester.pozarnik at tracetracker.com>
> wrote:
> > >> Silvester Pozarnik wrote this on [20 June 2007 13:00]:
> > >>
> > >> In the antlr 2.7.7 you could change the behaviour of Lexer so
> > >> that tokens are recognized az literals in special cases by
> > >> overriding the
> > >> testLiteralsTable() method in CharScanner class. How to the
> > >> same in antlr 3.0 if you have a grammar as:
> > >>
> > >>      grammar test;
> > >>      tokens {
> > >>              MYTOKEN = 'mytoken';
> > >>      }
> > >>      mygrammar:
> > >>              {
> > >>              MYTOKEN LPAREN IDENTIFIER RPAREN
> > >>              }
> > >>
> > >>      LPAREN   : '(' ;
> > >>      RPAREN   : ')' ;
> > >>      IDENTIFIER
> > >>              : ('a'..'z' | 'A'..'Z' | '\u0080'..'\ufffe') (
> > >> Letter | Digit)*;
> > >>
> > >>      fragment Letter
> > >>              : 'a'..'z' | 'A'..'Z' | '_' |'-' |
'\u0080'..'\ufffe';
> > >>
> > >>      fragment Digit
> > >>              : '0'..'9';
> > >>
> > >> So that the input "mytoken(mytoken)" is a valid. The first
> > >> 'mytoken' should be recognized as MYTOKEN, but the second
> > >> 'mytoken' has to be recognized as an IDENTIFIER. Is there a
> > >> way to achieve this?
> >
> >
> > >
> > >Not to my knowledge (and this applies to V2.x too). Is suspect you
need
> > to
> > >change your 'mygrammar' rule:
> > >
> > >       mygrammar : MYTOKEN LPAREN (MYTOKEN|IDENTIFIER) RPAREN
> > >
> > >Micheal
> >
> > Hei Micheal,
> >
> > The way you proposed to change the rule would not work as it is
still
> > undeterministic when processed by Lexer ("should I recognize an
> > IDENTIFIER or MYTOKEN!?). I'm not sure what takes precedence here.
The
> > proposed parser rule also alter the nature of language.
> >
> > BR.
> > Silvester Pozarnik
> >
> 
> In ANTLR 3 lexers the rule which is mentioned first will take
> precedence with no warnings given. Literals specified in tokens
> section have precedence over explicit lexer rules. So MYTOKEN will
> take precedence. As far as I can see Michael's proposed solution
> should work fine for your needs. To generalise you could do something
> like:
> 
> mygrammar: MYTOKEN1 LPAREN idOrKeyword RPAREN;
> idOrKeyword: IDENTIFIER|MYTOKEN1|MYTOKEN2
{LT(-1).setType(IDENTIFIER);};
> 
> where MYTOKEN1, MYTOKEN2 etc are your keywords then when keywords are
> allowed you use idOrKeyword rather than IDENTIFIER. The action (unsure
> of exact syntax there) means later phases don't need to deal with
> this.
> Or you can have keywords recognised as IDENTIFIER in your lexer and
> then use predicates to test the text in your parser. Something like:
> 
> mygrammar: myToken LPAREN IDENTIFIER RPAREN;
> myToken: {input.LT(1).getText() == "mytoken"}? IDENTIFIER {
> input.LT(-1).setType(MYTOKEN);};
> 
> Tom.


The first solution with "{LT(-1).setType(IDENTIFIER);}" did a trick.

Thanks a lot, Tom!

BR
Silvester


More information about the antlr-interest mailing list