[antlr-interest] solution to lexer issue

Terence Parr parrt at cs.usfca.edu
Wed Oct 24 19:30:55 PDT 2007


We agree antlr is behaving unnaturally here (i.e., it's wrong).  It  
is, however, behaving as I designed it to do.  There is no way to  
make it work better using current static analysis and current  
assumptions.  I can turn the error message back on which would at  
least make ANTLR behave sanely, error msg if problem, but this would  
likely turn on errors between keywords / ID rules.

Solution is to change my assumption that any char can follow a token  
(some of you don't believe me that is the problem but it is).   
Anyhoo, if I assume valid input now instead, then all of a sudden I  
have a fighting chance.  If I let ANTLR's static analysis roam beyond  
a token to the start of any valid token, it will clearly see a  
problem in the following cases that need k>1:

NUMBER: ('0'..'9')+ ('.' ('0'..'9')+)?;
DOT : '.' ;

NUMBER: ('0'..'9')+ ('.' ('0'..'9')+)?;
OTHER: .;

ONE: 'one';
TWO: 'two';
OTHER: .;

Runtime errors might get messed up a bit by my new assumption. I'll  
have to investigate that.

I didn't say there was NO solution, just not using LL(*) and  
assumption any char can follow.

Note that all three examples are ambiguous. Same input, different  
rules can match.  I'll try to hush as many warnings as possible while  
leaving important ones and while forcing antlr to see beyond one  
token to any other.  I'm not sure this will work properly in all  
cases. Will have to look further.

I believe that solution will satisfy everyone.    Added improvement  
request:

http://www.antlr.org:8888/browse/ANTLR-189

Ter


More information about the antlr-interest mailing list