[antlr-interest] [newbie] Lexer Confusion

UW Student uw.anon at gmail.com
Fri Jul 4 14:46:51 PDT 2008


Johannes Luber wrote:
> UW Student schrieb:
>> Hello,
>>
>> I'm having some trouble understanding the behaviour of Antlr's lexer.  
>> I am quite new to Antlr (having previously focussed on JFlex) so 
>> please excuse me if this is a naive question.
>>
>> My grammar is as follows
>>
>> grammar Test;
>>
>> nonTerm : TERM1 TERM2;
>>
>> TERM1 : '..'+;
>> TERM2 : '.';
>>
>> However, when I try to recognize the string '...' (without the 
>> quotes), AntlrWorks indicates a MismatchedTokenException.  (Looking at 
>> the generated code, I believe this is because TERM1 is consuming the 
>> third DOT and then failing to find a fourth.)  I do not understand why 
>> this is happening.
>>
>> The above example is a toy language that I created to try to isolate 
>> the problem I was having.  My actual lexer looks more like this:
>>
>> TERM1 : (' ' | '...')+
>> TERM2 : '.'
>>
>> And I would like ' .' to be lexed as [TERM1, TERM2].
>>
>> Any suggestions would be greatly appreciated.
>>
>> Thanks,
>> Andrew
>>
> 
> ANTLR doesn't try TERM2 once it decides to try TERM1. This is a 
> limitation of the analysis algorithm. To get your result, you have to 
> try something like:
> 
> grammar Test2;
> 
> tokens{
> TERM2;
> }
> 
> nonTerm : TERM1 TERM2;
> 
> 
> TERM1: '.' ( ('.')=> '.' {$type = TERM2;} ) ;
> 
> Johannes
> 

Hi Johannes,

Thank you for your prompt response.

I still have a couple of questions:

1) In my original grammar, how did the lexer decide which rule to 
attempt first?  Did it just pick the one that would result in the longer 
match?

2) Can you please confirm my understanding of your use of a syntactic 
predicate?  On a single DOT, the lexer will return a TERM1 token.  On a 
double DOT, the lexer will return a TERM2 token.  If this is the case, 
won't a triple DOT be lexed as TERM2 TERM1 (rather than the reverse)?

Thanks,
Andrew


More information about the antlr-interest mailing list