[antlr-interest] Lexing problem I cannot resolve

Gavin Lambert antlr at mirality.co.nz
Sun Aug 3 05:01:32 PDT 2008


At 23:21 3/08/2008, Raphael Reitzig wrote:
 >But do I understand correctly that in your language '..5' is a
 >valid range? What range is that? I only had 'INT..INT' in mind
 >and would create a single token if it.

The problem is with the FLOAT rule.  Faced with the input "1..5", 
ANTLR will consider the INT rule (matches "1" only) and the FLOAT 
rule (matches "1.") -- so FLOAT will win.  (Once ANTLR sees 
"digits followed by a dot" that's enough for it to reject INT in 
favour of FLOAT -- it doesn't look ahead any further than 
that).  Once it's inside FLOAT, it will consume the dot, then 
encounter another dot after it, say "that's not a digit!" and 
exit.  Now it has input of ".5" to go, so it parses that as 
another FLOAT.

(I'm making some assumptions about the rules in the OP's grammar, 
though, since they weren't actually posted.  But I'm reasonably 
sure they're valid.)

The rules I posted (or minor variations thereof) should resolve 
this kind of ambiguity.

 >numerical construct :
 >   a=INT  THREE_DOTS     -> ^(ELLIPSIS $a)
 >| a=INT  TWO_DOTS b=INT -> ^(RANGE $a $b
 >| a=INT? ONE_DOT b=INT  -> ^(FLOAT ($a + $b))
 >| a=INT                 -> ^(INTEGER $a);

The trouble with doing this kind of thing in the parser is that 
you no longer have single tokens.  In addition to giving extra 
work for the tree walker to deal with, this also means that any 
HIDDEN or off-channel tokens produced by the lexer could have been 
silently inserted.

This isn't always a bad thing -- after all, it will let you parse 
"1./*foo*/5" as if it were "1.5" -- but that might turn out to be 
more confusing than helpful.  And it would similarly parse 
"1.    5" as "1.5" as well, which is usually less 
desirable.  (Assuming that comments and whitespace are being 
hidden.)



More information about the antlr-interest mailing list