[antlr-interest] Lexer code not generated as expected?

Tue Dec 15 11:58:24 PST 2009

Backtracking is for parsing. However it is really only useful when you can guarantee that the input will not have syntax errors or when the incoming language is so ambiguous that you really don't have any choice. Generally you want to left factor your grammar and remove ambiguities. 

The difference between ANTLR and say flex is that flex is a set of patterns which it tries to match one after the other (simplistically speaking) so it will try the first pattern that might match, that fails so it moves on to the next pattern that might match and so on. ANTLR however lexes using LL(n) grammar and in its analysis it distinguishes the tokens and then generates enough code to send the match down the predicted path. It does not generate code that goes all the way to the 'end' of the token then if it matches, selects that token. So, with your rules it sees the first few characters and says 'that has to be this token'. If you want the match or move on type behavior, then you explicitly do that with the predicates. Ter has talked about changing the lexers to behave a bit more like 'people expect' but that is not there right now. It is to do with seeing beyond the end of the token. 

So when you first look you might feel it is complicated, but you are soon used to looking at it and it begins to make sense. Generally I recommend that you don't just take the easy path when writing recognizers. Counter intuitively, turning on things like global backtracking requires that you know MORE about constructing grammars rather than less as you are effectively turning off all the warnings about ambiguities that ANTLR would give you. This is fine in some senses but you are also turning off warnings about things that you didn’t realize were there/possible/encoded. Being a bit more explicit in the way you specify things usually results in a simpler/better grammar, in my experience.

Jim

> -----Original Message-----
> From: Peter Boughton [mailto:boughtonp at gmail.com]
> Sent: Tuesday, December 15, 2009 11:25 AM
> To: Jim Idle
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Lexer code not generated as expected?
> 
> > You have to be more specific with the lexer here if you want that
> kind of behavior
> 
> I don't get why?
> 
> 
> The ANTLR book mentions auto-backtracking (which seems to be what is
> *not* happening here), and which can be turned on with "options
> {backtrack=true;}"
> 
> Would that not solve this problem?
> If so, are there any issues (aside from performance) that might be a
> reason not to have this turned on?
> 
> It certainly seems a general approach of "take the easy path, then
> optimise proven slow parts, as required" is more favourable than
> adding this complexity.