[antlr-interest] Lexer bug? (with test cases!)

Loring Craymer lgcraymer at yahoo.com
Wed Oct 24 09:51:56 PDT 2007



----- Original Message ----
> From: Terence Parr <parrt at cs.usfca.edu>
> To: antlr-interest Interest <antlr-interest at antlr.org>
> Sent: Wednesday, October 24, 2007 2:57:44 AM
> Subject: Re: [antlr-interest] Lexer bug? (with test cases!)
> 
> 
> On Oct 24, 2007, at 12:13 PM, Loring Craymer wrote:
> >>> lexer grammar test;
> >>> NUMBER: ('0'..'9')+ ('.' ('0'..'9')+)?;
> >>> OTHER: .;
> >
> > Take another look.  The '.' in the posted grammar is the character  
> > '.', not a wildcard; there is no ambiguity, just an LL(2)  
> > decision.  Unfortunately, the generated code makes an LL(1)  
> > decision and generates runtime errors as a result.  This is not a  
> > backtracking problem; note the selected workaround--it avoids  
> > having an epsilon alternative, but depends on k>1.
> 
> 
> Oh, sorry. you're talking about the (...)? subrule decision?  Ah,  
> well, it's the same really.  *Any* char can follow a token so the  
> wildcard follows every decision.  I can choose dot or wildcard.   
> That's ambig so I say LL(1).  Lex does some backtracking to make it  
> work more naturally.  ANTLR builds LL(*) recognizers, which are not  
> tuned specifically for building lexers as lex is.  Perhaps in the  
> future there could be a way for ANTLR to do this within confines of LL 
> (*).

I believe that you have a blindspot here.  An IDENT rule, for example, cannot be followed by an alphanumeric character.  There are character sequences that cannot follow a given rule, because they should be included in the rule matching.  ANTLR 3 lexers do not support FOLLOW sets (at least at token boundaries):  this is an implementation "feature", not an LL* behavior, and results from the assumption that "any char can follow a token" (that translates to "there is always an LL(1) decision to be made here").

--Loring

> 
> Terence
> 



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the antlr-interest mailing list