[antlr-interest] Ambiguity in grammar

Sun Mar 20 23:11:01 PDT 2011

On 03/21/2011 12:26 AM, Wojciech Tomasz Cichon wrote:
> i have in my grammar rules:
> 
> stmt :
> | ident '=' lexp SEMI  -> ^(SET ident lexp);
> 
> 
> factor  : 
>       '-'?  (NUMBER |ident )^
>         ....
>        ;
> 
> and
> lexp : term (SIMOP^  lexp)?;
> term  : factor (OP^  term)?;
> 
> OP :  '*' | '/' | '%';
> SIMOP : '+' | '-';
> 
> and i tried it on different inputs
> and for
> ID = –5; , ID = 5+3; etc it works, and it build correct tree
> bur if i’ll try 
> ID = 5-3;
> i get error: 
> mismatched input '-' expecting SEMI
> 
> i’m using options:
> options {
>   language = Java;
>   output = AST;
>     k  = 3;
> }
> 
> can anyone tell me what i should fix?

'-' can't be both used in a factor in the parser and converted to a
SIMOP in the lexer.  You can't have it both ways.

lexp : term ( ('+'^ | '-'^) lexp)?;
term : factor ( ('*'^ | '/'^ | '%'^) factor )?;

may be closer to what you want, but I think you really want:

lexp    : term ( ( PLUS^ | MINUS^ ) term )*
        ;
term    : factor ( ( STAR^ | SLASH^ | PERCENT^ ) factor)*
        ;
factor  : ( MINUS^ )? ( NUMBER | ident)
        ;

PLUS    : '+';
MINUS   : '-';
STAR    : '*';
SLASH   : '/';
PERCENT : '%';

(Change the token names as you see fit.)

If you do it this way, '-' is always a MINUS token in the lexer and you
can use MINUS in different rules in the parser.  (Note that your token
definitions of OP and SIMOP get expanded into the subrules in the
parser, which is necessary of you want them to be the roots of your
generated AST.)

> regards

-- 
Kevin J. Cummings
kjchome at verizon.net
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)