[antlr-interest] Problem with lexer rule for an optional suffix
David-Sarah Hopwood
david-sarah at jacaranda.org
Sat Nov 14 14:26:42 PST 2009
Scott Oakes wrote:
> Hoping for some newbie help on the following lexer.
>
> fragment DIGIT: '0'..'9';
> fragment LETTER: ('a'..'z'|'A'..'Z');
>
> ID: (LETTER | '.')+ ('.' DIGIT+)?
> | DIGIT+
> ;
>
> The idea is that ID is things like: "foo", "32", "bar.baz", or
> "foo.bar.32". However with input "foo.bar.32", I get two tokens,
> "foo.bar." and "32". How could I rewrite this so I get a single ID
> token, "foo.bar.32"?
This happens because (LETTER | '.')+ greedily matches "foo.bar.",
and then there is no remaining '.', so ('.' DIGIT+) does not match.
There does not appear to be any intended distinction between letters
and digits in your examples. If that is correct, perhaps you want:
fragment ELEMENT: (LETTER | DIGIT)+;
ID : ELEMENT ('.' ELEMENT)*;
If elements should not contain mixed letters and digits, then use:
fragment ELEMENT : LETTER+ | DIGIT+ ;
ID : ELEMENT ('.' ELEMENT)*;
If an ID should allow empty elements (i.e. initial, final, or consecutive
'.' characters), then this would be simpler:
ID : (LETTER | DIGIT | '.')+;
--
David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20091114/994327a8/attachment.bin
More information about the antlr-interest
mailing list