[antlr-interest] Problem with lexer rule for an optional suffix
Gavin Lambert
antlr at mirality.co.nz
Sat Nov 14 04:11:31 PST 2009
At 22:08 14/11/2009, Scott Oakes wrote:
> fragment DIGIT: '0'..'9';
> fragment LETTER: ('a'..'z'|'A'..'Z');
>
> ID: (LETTER | '.')+ ('.' DIGIT+)?
> | DIGIT+
> ;
>
>The idea is that ID is things like: "foo", "32", "bar.baz", or
>"foo.bar.32". However with input "foo.bar.32", I get two tokens,
>"foo.bar." and "32". How could I rewrite this so I get a single
ID
>token, "foo.bar.32"?
The problem here is that loops match greedily, when possible. So
in the input "foo.bar.32", the first loop consumes "foo.bar.", and
then the optional clause is skipped because it would require yet
another . in the input (which can't ever happen, because if it
were there then the first loop would have consumed that too).
There are quite a few options for resolving this, depending on
what constructs are legal in your language. One way is to use a
syntactic predicate:
ID : (LETTER | ('.' LETTER) => '.')+ ('.' DIGIT+)?
| DIGIT+
;
More information about the antlr-interest
mailing list