[antlr-interest] Lexer bug? (with test cases!)

Wed Oct 24 02:57:44 PDT 2007

On Oct 24, 2007, at 12:13 PM, Loring Craymer wrote:
>>> lexer grammar test;
>>> NUMBER: ('0'..'9')+ ('.' ('0'..'9')+)?;
>>> OTHER: .;
>
> Take another look.  The '.' in the posted grammar is the character  
> '.', not a wildcard; there is no ambiguity, just an LL(2)  
> decision.  Unfortunately, the generated code makes an LL(1)  
> decision and generates runtime errors as a result.  This is not a  
> backtracking problem; note the selected workaround--it avoids  
> having an epsilon alternative, but depends on k>1.

Oh, sorry. you're talking about the (...)? subrule decision?  Ah,  
well, it's the same really.  *Any* char can follow a token so the  
wildcard follows every decision.  I can choose dot or wildcard.   
That's ambig so I say LL(1).  Lex does some backtracking to make it  
work more naturally.  ANTLR builds LL(*) recognizers, which are not  
tuned specifically for building lexers as lex is.  Perhaps in the  
future there could be a way for ANTLR to do this within confines of LL 
(*).

Terence