[antlr-interest] Lexer ambiguities
shmuel siegel
antlr at shmuelhome.mine.nu
Mon Feb 11 07:21:01 PST 2008
Amal Khailtash wrote:
> How does one resolve lexer ambiguities? I have a grammar like:
>
> a : NUMBER UNIT ;
> b : VALUE NAME ;
>
> NUMBER : ('0'..'9')+ ;
> UNIT : 'kg' | 'lb' ;
>
> VALUE : '0' | '1' ;
> NAME : ('!'..'~')+ ;
>
> How can I distinguish between a NUMBER and a VALUE and between a UNIT
> and a NAME?
> -- Amal
>
As you know, the ambiguity at the lexer level is intrinsic. Every VALUE
is a NUMBER and everything is a NAME. The lexer runs in its own
environment and therefore the parser rules can't help resolve the lexer
ambiguities. I would rewrite the rules as follows:
a : (NUMBER|value) unit ;
b : value NAME ;
unit: 'kg' | 'lb' ;
value: '0' | '1';
NUMBER : ('0'..'9')+ ;
NAME : (('!'..'/')|(':'..'~'))
('!'..'~')* ;
The trick is that the lexer overrides the catchalls for the literal
inputs. I changed the definition of NAME so that it never begins with a
number. Otherwise I ran into an ambiguity with NUMBER. As an added note,
you need to decide how you want to handle whitespace.
More information about the antlr-interest
mailing list