[antlr-interest] Lexer bug?
Austin Hastings
Austin_Hastings at Yahoo.com
Sun Oct 21 06:35:37 PDT 2007
You're right. I looked at your definition of NUMBER and just assumed you
were using the common one. It looks like a bug.
In fact, (some time later) I'm looking at the generated code now with
new disrespect. The tokenizer is doing a minimal look-ahead and then
committing to a token - when it sees '1' in your 10..20 example, it
commits to a NUMBER. When it comes to '.' it commits to FRACTION. There
doesn't appear to be any consideration that one path might fail and
another be chosen.
I'm not sure I should thank you for it, but I'm glad you brought this
up. :-(
=Austin
Clifford Heath wrote:
> Austin Hastings wrote:
>> I'm guessing it's because "10..20" is two completely valid tokens:
>> 10. and .20. Both NUMBERs, of course.
>
> 10. is not a valid token unless followed by another digit. That's why
> I mentioned using DIGIT+ instead of DIGIT* in FRACTION.
>
>> Keep in mind that you have two different machines at work.
>
> Yes - see my post explaining that to Simon West, for example.
>
>> In theory, you should be able to describe integer literals, the range
>> operator, and floats, in that order, and have it work. But I'd try it...
>
> Interesting thought... Not relevant here, but I'll try it sometime.
>
> Clifford Heath.
>
>
>
More information about the antlr-interest
mailing list