[antlr-interest] Lexer bug?

Loring Craymer lgcraymer at yahoo.com
Sun Oct 21 23:21:01 PDT 2007



----- Original Message ----
> From: Gavin Lambert <antlr at mirality.co.nz>
> To: antlr-interest at antlr.org
> Sent: Sunday, October 21, 2007 8:02:24 PM
> Subject: Re: [antlr-interest] Lexer bug?
> 
> At 13:49 22/10/2007, Clifford Heath wrote:
>  >This rule consumes digits and one ".", then stops - and that's 
> not
>  >a legal token.
> 
> I've been complaining off and on about similar cases since the 
> early betas.  Some useful discussion came up a while back that the 
> predefined "Tokens" rule was being generated on the basis of 
> matching only one token, and all the lookahead is generated from 
> that same perspective; whereas if it were generated to match a 
> sequence of tokens instead it generated better lookahead.

Um--I don't think that this is quite right.  ANTLR 3 has an inelegant tendency to make k=1 decisions when it should not.  Specifically:  any time there is an epsilon alternative--as in FRACTION?--ANTLR tends to make a k=1 decision, as in "I see a '.'; therefore, this is a FRACTION" in Austin's NUMBER rule.  From my perspective, this is probably a bug in the LL* implementation: a lookahead DFA should be generated for such cases (to replace the "if (LA(1) == '.') mFRACTION()") that does the right thing.

I should also point out that ANTLR does not match 10. as a legal token; it matches 10. as a partial token and finds no viable alternative for matching the second '.'.

If the FRACTION rule is inlined, ANTLR 3 will probably do the right thing (I have not tested this example, but have had to resort to inlining in other cases).  Again, this is indicative that Austin is correct in his assertion that this is a bug:  there should be no difference between rule invocations and the equivalent inlined token or character sequences.

--Loring




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the antlr-interest mailing list