[antlr-interest] Lexer bug?
Gavin Lambert
antlr at mirality.co.nz
Tue Oct 23 05:31:01 PDT 2007
At 01:00 24/10/2007, Clifford Heath wrote:
>And then there's the fact that the string above has
>white-space embedded, which means that it potentially
>interacts negatively with the whitespace handling...
>or maybe not in this case.
All lexer rules that permit embedded whitespace must explicitly
specify it, since the whitespace hiding/skipping rule is at the
same "level". So that could potentially complicate your rule a
bit if you wanted to handle it in there. But if you're using
Jim's rule (with multi-token emitting added), all you might need
to do is to specify that whitespace is allowed after the second
'.' of the '..' pair.
This is because if the input is "10 .. 30" you'll already get it
as three separate tokens without doing any extra work. If the
input is "10..30" you'll need to handle it within the one rule
(because of the dot recognition problem) -- but you can then emit
the same three tokens as in the first case. If the input is "10
..30" the first number will be handled ok by itself, then you'll
have to break apart the combined "..30" in a single rule and
output two tokens (so again you end up with the same three tokens
as in the first case). If the input is "10.. 30" then you can
either treat it like the second case (doing it all in one rule, by
explicitly specifying the whitespace and outputting three tokens)
or treat it like the third case (making a number with trailing ..
output two tokens).
>Still, I already dislike that I have to re-lex a NUMBER
>to find whether it's octal, hex, integer or real.
>I already paid a lexer to do that for me, so why am I
>doing it again?
I don't know -- why are you? There's certainly no need to -- just
output different tokens in each case and then make a parser rule
that accepts any of them when you're in a context that doesn't
care what kind of numeric literal is provided.
(This is actually easier to do with a rule similar to what Jim
proposed, since each path through the rule is more explicitly
spelled out.)
More information about the antlr-interest
mailing list