[antlr-interest] Lexer ambiguities

Mark Volkmann r.mark.volkmann at gmail.com
Sun Feb 10 14:00:10 PST 2008


On Feb 10, 2008 11:59 AM, Amal Khailtash <akhailtash at gmail.com> wrote:
> How does one resolve lexer ambiguities?  I have a grammar like:
>
>   a : NUMBER UNIT ;
>   b : VALUE NAME ;
>
>   NUMBER : ('0'..'9')+ ;
>   UNIT : 'kg'  | 'lb' ;
>
>   VALUE : '0' | '1' ;
>    NAME : ('!'..'~')+ ;
>
> How can I distinguish between a NUMBER and a VALUE and between a UNIT and a
> NAME?

I believe the key is that the order of lexer rules is significant.
You need to put the VALUE rule before the NUMBER rule
and the UNIT rule before the NAME rule
so that the most specific one wins out.

I can get your grammar to work if the input has spaces between the
parts of a and b like this.

19 kg
1 Amal

I'm not sure if there's a way to make this work without the spaces.

Here's my version of the grammar.

grammar NumberValue;

file: line+;
line: (a | b) terminator;

a: NUMBER UNIT;
b: VALUE NAME;

VALUE: '0' | '1';
NUMBER: '0'..'9'+;

UNIT: 'kg' | 'lb';
NAME: '!'..'~'+;

terminator: NEWLINE | EOF;
NEWLINE: ('\r'? '\n')+;
WHITESPACE: (' ' | '\t')+ { $channel = HIDDEN; };

-- 
R. Mark Volkmann
Object Computing, Inc.


More information about the antlr-interest mailing list