[antlr-interest] Problem with lexer rule for an optional suffix

Gavin Lambert antlr at mirality.co.nz
Sat Nov 14 04:11:31 PST 2009


At 22:08 14/11/2009, Scott Oakes wrote:
 >  fragment DIGIT:      '0'..'9';
 >  fragment LETTER: ('a'..'z'|'A'..'Z');
 >
 >  ID:  (LETTER | '.')+ ('.' DIGIT+)?
 >       | DIGIT+
 >      ;
 >
 >The idea is that ID is things like: "foo", "32", "bar.baz", or
 >"foo.bar.32". However with input "foo.bar.32", I get two tokens,
 >"foo.bar." and "32". How could I rewrite this so I get a single 
ID
 >token, "foo.bar.32"?

The problem here is that loops match greedily, when possible.  So 
in the input "foo.bar.32", the first loop consumes "foo.bar.", and 
then the optional clause is skipped because it would require yet 
another . in the input (which can't ever happen, because if it 
were there then the first loop would have consumed that too).

There are quite a few options for resolving this, depending on 
what constructs are legal in your language.  One way is to use a 
syntactic predicate:

   ID : (LETTER | ('.' LETTER) => '.')+ ('.' DIGIT+)?
      | DIGIT+
      ;



More information about the antlr-interest mailing list