[antlr-interest] [newbie] Lexer Confusion

Johannes Luber jaluber at gmx.de
Fri Jul 4 14:26:50 PDT 2008


UW Student schrieb:
> Hello,
> 
> I'm having some trouble understanding the behaviour of Antlr's lexer.  I 
> am quite new to Antlr (having previously focussed on JFlex) so please 
> excuse me if this is a naive question.
> 
> My grammar is as follows
> 
> grammar Test;
> 
> nonTerm : TERM1 TERM2;
> 
> TERM1 : '..'+;
> TERM2 : '.';
> 
> However, when I try to recognize the string '...' (without the quotes), 
> AntlrWorks indicates a MismatchedTokenException.  (Looking at the 
> generated code, I believe this is because TERM1 is consuming the third 
> DOT and then failing to find a fourth.)  I do not understand why this is 
> happening.
> 
> The above example is a toy language that I created to try to isolate the 
> problem I was having.  My actual lexer looks more like this:
> 
> TERM1 : (' ' | '...')+
> TERM2 : '.'
> 
> And I would like ' .' to be lexed as [TERM1, TERM2].
> 
> Any suggestions would be greatly appreciated.
> 
> Thanks,
> Andrew
> 

ANTLR doesn't try TERM2 once it decides to try TERM1. This is a 
limitation of the analysis algorithm. To get your result, you have to 
try something like:

grammar Test2;

tokens{
TERM2;
}

nonTerm : TERM1 TERM2;


TERM1: '.' ( ('.')=> '.' {$type = TERM2;} ) ;

Johannes


More information about the antlr-interest mailing list