[antlr-interest] solution to lexer issue
Terence Parr
parrt at cs.usfca.edu
Wed Oct 24 19:30:55 PDT 2007
We agree antlr is behaving unnaturally here (i.e., it's wrong). It
is, however, behaving as I designed it to do. There is no way to
make it work better using current static analysis and current
assumptions. I can turn the error message back on which would at
least make ANTLR behave sanely, error msg if problem, but this would
likely turn on errors between keywords / ID rules.
Solution is to change my assumption that any char can follow a token
(some of you don't believe me that is the problem but it is).
Anyhoo, if I assume valid input now instead, then all of a sudden I
have a fighting chance. If I let ANTLR's static analysis roam beyond
a token to the start of any valid token, it will clearly see a
problem in the following cases that need k>1:
NUMBER: ('0'..'9')+ ('.' ('0'..'9')+)?;
DOT : '.' ;
NUMBER: ('0'..'9')+ ('.' ('0'..'9')+)?;
OTHER: .;
ONE: 'one';
TWO: 'two';
OTHER: .;
Runtime errors might get messed up a bit by my new assumption. I'll
have to investigate that.
I didn't say there was NO solution, just not using LL(*) and
assumption any char can follow.
Note that all three examples are ambiguous. Same input, different
rules can match. I'll try to hush as many warnings as possible while
leaving important ones and while forcing antlr to see beyond one
token to any other. I'm not sure this will work properly in all
cases. Will have to look further.
I believe that solution will satisfy everyone. Added improvement
request:
http://www.antlr.org:8888/browse/ANTLR-189
Ter
More information about the antlr-interest
mailing list