[antlr-interest] Lexer ambiguities

Mon Feb 11 07:21:01 PST 2008

Amal Khailtash wrote:
> How does one resolve lexer ambiguities?  I have a grammar like:
>
>   a : NUMBER UNIT ;
>   b : VALUE NAME ;
>
>   NUMBER : ('0'..'9')+ ;
>   UNIT : 'kg'  | 'lb' ;
>
>   VALUE : '0' | '1' ;
>   NAME : ('!'..'~')+ ;
>
> How can I distinguish between a NUMBER and a VALUE and between a UNIT 
> and a NAME?
> -- Amal
>
As you know, the ambiguity at the lexer level is intrinsic. Every VALUE 
is a NUMBER and everything is a NAME. The lexer runs in its own 
environment and therefore the parser rules can't help resolve the lexer 
ambiguities. I would rewrite the rules as follows:

a : (NUMBER|value) unit ;
  b : value NAME ;

unit: 'kg'  | 'lb' ;
value: '0' | '1';

  NUMBER : ('0'..'9')+ ;
  NAME : (('!'..'/')|(':'..'~'))
              ('!'..'~')* ;

The trick is that the lexer overrides the catchalls for the literal 
inputs. I changed the definition of NAME so that it never begins with a 
number. Otherwise I ran into an ambiguity with NUMBER. As an added note, 
you need to decide how you want to handle whitespace.