[antlr-interest] Lexing problem I cannot resolve
Gavin Lambert
antlr at mirality.co.nz
Sun Aug 3 05:01:32 PDT 2008
At 23:21 3/08/2008, Raphael Reitzig wrote:
>But do I understand correctly that in your language '..5' is a
>valid range? What range is that? I only had 'INT..INT' in mind
>and would create a single token if it.
The problem is with the FLOAT rule. Faced with the input "1..5",
ANTLR will consider the INT rule (matches "1" only) and the FLOAT
rule (matches "1.") -- so FLOAT will win. (Once ANTLR sees
"digits followed by a dot" that's enough for it to reject INT in
favour of FLOAT -- it doesn't look ahead any further than
that). Once it's inside FLOAT, it will consume the dot, then
encounter another dot after it, say "that's not a digit!" and
exit. Now it has input of ".5" to go, so it parses that as
another FLOAT.
(I'm making some assumptions about the rules in the OP's grammar,
though, since they weren't actually posted. But I'm reasonably
sure they're valid.)
The rules I posted (or minor variations thereof) should resolve
this kind of ambiguity.
>numerical construct :
> a=INT THREE_DOTS -> ^(ELLIPSIS $a)
>| a=INT TWO_DOTS b=INT -> ^(RANGE $a $b
>| a=INT? ONE_DOT b=INT -> ^(FLOAT ($a + $b))
>| a=INT -> ^(INTEGER $a);
The trouble with doing this kind of thing in the parser is that
you no longer have single tokens. In addition to giving extra
work for the tree walker to deal with, this also means that any
HIDDEN or off-channel tokens produced by the lexer could have been
silently inserted.
This isn't always a bad thing -- after all, it will let you parse
"1./*foo*/5" as if it were "1.5" -- but that might turn out to be
more confusing than helpful. And it would similarly parse
"1. 5" as "1.5" as well, which is usually less
desirable. (Assuming that comments and whitespace are being
hidden.)
More information about the antlr-interest
mailing list