[antlr-interest] a simple (not for me :)) grammar problem

Gavin Lambert antlr at mirality.co.nz
Sun Jan 6 23:57:43 PST 2008


At 16:26 7/01/2008, Mark Volkmann wrote:
 >It should be easy right. Terr already gave the hint that the
 >problem is that it was greedily grabbing the DOT for FLOAT
 >instead of leaving it for the separator between the number
 >and the identifier. Piece of cake? Well I've tried several
 >things I thought would work to no avail.
 >Why in the world doesn't this work?
[...]
 > backtrack = true; // I shouldn't need this, but I don't think 
it
 >can hurt.

It's not going to help, either.  "backtrack = true" has no effect 
on the lexer.

 >FLOAT: NUMBER DOT NUMBER;
 >INTEGER: NUMBER;
 >IDENTIFIER: LETTER+;
 >DOT: '.';
 >fragment NUMBER: DIGIT+;
 >fragment LETTER: 'a' .. 'z';
 >fragment DIGIT: '0' .. '9';

This has been discussed to death before.  For reasons of 
performance (and some other obscure thing, I think), when 
processing a + loop ANTLR will use k=1 lookahead.  Thus when faced 
with the choice between FLOAT and INTEGER, it looks ahead to see 
at least one DIGIT and then says "ok, that's a FLOAT".  It doesn't 
look past all the DIGITs to see whether there's a DOT or 
not.  (Ter has said he might look into improving this a bit in a 
later version.)

Whenever there's a common prefix in your tokens, you will need to 
combine the rules to remove the ambiguity:

INTEGER
   : NUMBER
     ( /* nothing afterwards */
     | DOT NUMBER { $type = FLOAT; }
     )
   ;



More information about the antlr-interest mailing list