[antlr-interest] Problem with lexer rule for an optional suffix

Sat Nov 14 09:17:53 PST 2009

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Scott Oakes
> Sent: Saturday, November 14, 2009 1:08 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Problem with lexer rule for an optional
> suffix
> 
> Hoping for some newbie help on the following lexer.
> 
>   fragment DIGIT:      '0'..'9';
>   fragment LETTER: ('a'..'z'|'A'..'Z');
> 
>   ID:  (LETTER | '.')+ ('.' DIGIT+)?
>        | DIGIT+
>       ;

Well this rule is wrong. It allows:

.....99
A..44

But not A.56

You would need:

ID : (LETTER+) (('.' LETTER)=>'.' LETTER+)* (('.' DIGIT)=> '.' DIGIT+)? ;

But you really want to do such things in the parser as you usually want to dissect the identifier. If a part of the id can only be numbers, then you could do it in the lexer, but then any errors will come out from the lexer and be very confusing.

The general idea is to cover everything in the lexer so it does not issue messages, but leave context out of the lexer. Then in the parser, defer as much error handling as possible to the tree walker. This way you get much better error messages. With your example:

a.b4.f.5

Lexer: Unexpected character at '4'
Parser: Extraneous token '4'
Walker (Though you can do this one in the parser): 'b4' is not a valid component of multipart identifier

So:

ID : LETTER+;
NUM : DIGIT+;
id : id_part (DOT^ id_part)*  { actions to check in Java go here if you have no tree walker } ;

Jim