[antlr-interest] Lexer ambiguities
Mark Volkmann
r.mark.volkmann at gmail.com
Mon Feb 11 07:40:17 PST 2008
On Feb 11, 2008 9:21 AM, shmuel siegel <antlr at shmuelhome.mine.nu> wrote:
>
> Amal Khailtash wrote:
> > How does one resolve lexer ambiguities? I have a grammar like:
> >
> > a : NUMBER UNIT ;
> > b : VALUE NAME ;
> >
> > NUMBER : ('0'..'9')+ ;
> > UNIT : 'kg' | 'lb' ;
> >
> > VALUE : '0' | '1' ;
> > NAME : ('!'..'~')+ ;
> >
> > How can I distinguish between a NUMBER and a VALUE and between a UNIT
> > and a NAME?
> > -- Amal
> >
> As you know, the ambiguity at the lexer level is intrinsic. Every VALUE
> is a NUMBER and everything is a NAME. The lexer runs in its own
> environment and therefore the parser rules can't help resolve the lexer
> ambiguities. I would rewrite the rules as follows:
>
> a : (NUMBER|value) unit ;
> b : value NAME ;
>
> unit: 'kg' | 'lb' ;
> value: '0' | '1';
>
> NUMBER : ('0'..'9')+ ;
> NAME : (('!'..'/')|(':'..'~'))
> ('!'..'~')* ;
>
> The trick is that the lexer overrides the catchalls for the literal
> inputs. I changed the definition of NAME so that it never begins with a
> number. Otherwise I ran into an ambiguity with NUMBER. As an added note,
> you need to decide how you want to handle whitespace.
I can't get your ideas above to work. If you tested this, can you send
us your complete grammar file?
Here is my grammar with your ideas incorporated.
grammar NumberValue;
file: (line terminator)*;
line: a | b;
a: (value | NUMBER) unit;
b: value NAME;
unit: 'kg' | 'lb';
value: '0' | '1';
NUMBER: '0'..'9'+;
NAME: ('!'..'/' | ':'..'~' | '!'..'~')+;
terminator: NEWLINE | EOF;
NEWLINE: ('\r'? '\n')+;
WHITESPACE: (' ' | '\t')+ { $channel = HIDDEN; };
Here is my input.
1Mark
19kg
--
R. Mark Volkmann
Object Computing, Inc.
More information about the antlr-interest
mailing list