[antlr-interest] solution to lexer issue

Thu Oct 25 03:40:13 PDT 2007

At 15:30 25/10/2007, Terence Parr wrote:
 >Solution is to change my assumption that any char can follow a
 >token (some of you don't believe me that is the problem but it 
is).

I'm curious, isn't this "any char can follow a token" thing only 
true for (a) filter=true or (b) malformed input?  Neither of which 
ought to be the common case?

And I'm still not sure how assuming that any character at all 
could follow the "e" in "one" means that you don't need to test 
that the "e" is actually there at all.  But I'll take your word 
for it.

 >NUMBER: ('0'..'9')+ ('.' ('0'..'9')+)?;
 >DOT : '.' ;
 >
 >NUMBER: ('0'..'9')+ ('.' ('0'..'9')+)?;
 >OTHER: .;
 >
 >ONE: 'one';
 >TWO: 'two';
 >OTHER: .;
[...]
 >Note that all three examples are ambiguous. Same input,
 >different rules can match.

If all rules have equal precedence, then sure, they're 
ambiguous.  But I thought the lexer was supposed to have defined 
precedence (longest match and/or first listed token 
wins)..?  Certainly the generated mTokens rule appears to test 
them in order...

 >I believe that solution will satisfy everyone.    Added
 >improvement request:
 >
 >http://www.antlr.org:8888/browse/ANTLR-189

Cool! :)