[antlr-interest] Lexer ambiguities

Mon Feb 11 07:40:17 PST 2008

On Feb 11, 2008 9:21 AM, shmuel siegel <antlr at shmuelhome.mine.nu> wrote:
>
> Amal Khailtash wrote:
> > How does one resolve lexer ambiguities?  I have a grammar like:
> >
> >   a : NUMBER UNIT ;
> >   b : VALUE NAME ;
> >
> >   NUMBER : ('0'..'9')+ ;
> >   UNIT : 'kg'  | 'lb' ;
> >
> >   VALUE : '0' | '1' ;
> >   NAME : ('!'..'~')+ ;
> >
> > How can I distinguish between a NUMBER and a VALUE and between a UNIT
> > and a NAME?
> > -- Amal
> >
> As you know, the ambiguity at the lexer level is intrinsic. Every VALUE
> is a NUMBER and everything is a NAME. The lexer runs in its own
> environment and therefore the parser rules can't help resolve the lexer
> ambiguities. I would rewrite the rules as follows:
>
> a : (NUMBER|value) unit ;
>   b : value NAME ;
>
> unit: 'kg'  | 'lb' ;
> value: '0' | '1';
>
>   NUMBER : ('0'..'9')+ ;
>   NAME : (('!'..'/')|(':'..'~'))
>               ('!'..'~')* ;
>
> The trick is that the lexer overrides the catchalls for the literal
> inputs. I changed the definition of NAME so that it never begins with a
> number. Otherwise I ran into an ambiguity with NUMBER. As an added note,
> you need to decide how you want to handle whitespace.

I can't get your ideas above to work. If you tested this, can you send
us your complete grammar file?

Here is my grammar with your ideas incorporated.

grammar NumberValue;

file: (line terminator)*;
line: a | b;
a: (value | NUMBER) unit;
b: value NAME;

unit: 'kg' | 'lb';
value: '0' | '1';

NUMBER: '0'..'9'+;
NAME: ('!'..'/' | ':'..'~' | '!'..'~')+;

terminator: NEWLINE | EOF;
NEWLINE: ('\r'? '\n')+;
WHITESPACE: (' ' | '\t')+ { $channel = HIDDEN; };

Here is my input.

1Mark
19kg

-- 
R. Mark Volkmann
Object Computing, Inc.