[antlr-interest] Lexer bug?

Austin Hastings Austin_Hastings at Yahoo.com
Sun Oct 21 06:35:37 PDT 2007


You're right. I looked at your definition of NUMBER and just assumed you 
were using the common one. It looks like a bug.

In fact, (some time later) I'm looking at the generated code now with 
new disrespect. The tokenizer is doing a minimal look-ahead and then 
committing to a token - when it sees '1' in your 10..20 example, it 
commits to a NUMBER. When it comes to '.' it commits to FRACTION. There 
doesn't appear to be any consideration that one path might fail and 
another be chosen.

I'm not sure I should thank you for it, but I'm glad you brought this 
up. :-(

=Austin


Clifford Heath wrote:
> Austin Hastings wrote:
>> I'm guessing it's because "10..20" is two completely valid tokens: 
>> 10. and .20. Both NUMBERs, of course.
>
> 10. is not a valid token unless followed by another digit. That's why
> I mentioned using DIGIT+ instead of DIGIT* in FRACTION.
>
>> Keep in mind that you have two different machines at work.
>
> Yes - see my post explaining that to Simon West, for example.
>
>> In theory, you should be able to describe integer literals, the range 
>> operator, and floats, in that order, and have it work. But I'd try it...
>
> Interesting thought... Not relevant here, but I'll try it sometime.
>
> Clifford Heath.
>
>
>



More information about the antlr-interest mailing list