[antlr-interest] Problem with lexer rule for an optional suffix

David-Sarah Hopwood david-sarah at jacaranda.org
Sat Nov 14 14:26:42 PST 2009


Scott Oakes wrote:
> Hoping for some newbie help on the following lexer.
> 
>   fragment DIGIT:      '0'..'9';
>   fragment LETTER: ('a'..'z'|'A'..'Z');
> 
>   ID:  (LETTER | '.')+ ('.' DIGIT+)?
>        | DIGIT+
>       ;
> 
> The idea is that ID is things like: "foo", "32", "bar.baz", or
> "foo.bar.32". However with input "foo.bar.32", I get two tokens,
> "foo.bar." and "32". How could I rewrite this so I get a single ID
> token, "foo.bar.32"?

This happens because (LETTER | '.')+ greedily matches "foo.bar.",
and then there is no remaining '.', so ('.' DIGIT+) does not match.

There does not appear to be any intended distinction between letters
and digits in your examples. If that is correct, perhaps you want:

  fragment ELEMENT: (LETTER | DIGIT)+;
  ID : ELEMENT ('.' ELEMENT)*;

If elements should not contain mixed letters and digits, then use:

  fragment ELEMENT : LETTER+ | DIGIT+ ;
  ID : ELEMENT ('.' ELEMENT)*;

If an ID should allow empty elements (i.e. initial, final, or consecutive
'.' characters), then this would be simpler:

  ID : (LETTER | DIGIT | '.')+;

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20091114/994327a8/attachment.bin 


More information about the antlr-interest mailing list