[antlr-interest] ANTLR vs lex/yacc

Mon Jan 17 02:24:26 PST 2005

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Mike Bresnahan
> Sent: Monday, 17 January 2005 3:41 PM
> To: antlr-interest at antlr.org
> Subject: RE: [antlr-interest] ANTLR vs lex/yacc
> 
> Here's an example of where I find LL(k)/ANTLR obtuse.  Take the following
> standard expression example that is similar to many on the web.
> 
> class ExpressionParser extends Parser;
> 
> expr     : sumExpr SEMI!;
> sumExpr  : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
> prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
> powExpr  : atom (POW^ atom)? ;
> atom     : INT ;
> 
> class ExpressionLexer extends Lexer;
> 
> PLUS  : '+' ;
> MINUS : '-' ;
> MUL   : '*' ;
> DIV   : '/' ;
> MOD   : '%' ;
> POW   : '^' ;
> SEMI  : ';' ;
> protected DIGIT : '0'..'9' ;
> INT   : (DIGIT)+ ;
> 
> 
> What I find really weird about this grammar is that a "sumExpr" can be not
> only "x + y" as I would expect, but it can also be "x * y" and "x^2" and
> "5".  That's unintuitive to me.  I don't remember having this issue with
> LALR/yacc.  Is the unintuitive to me only because I learned it differently
> the first time (i.e. in yacc)?

Well as a top-down parser, you get

01) expr

becomes

02) sumExpr SEMI
03) prodExpr ( (PLUS|MINUS) prodExpr )*
04) powExpr ( (MUL|DIV|MOD) powExpr )* ( (PLUS|MINUS) prodExpr )*
05) atom (POW atom)? ( (MUL|DIV|MOD) powExpr )* ( (PLUS|MINUS) prodExpr )*
06) INT (POW INT)? ( (MUL|DIV|MOD) powExpr )* ( (PLUS|MINUS) prodExpr )*

etc etc

So it matches an integer, optionally followed by '^', '*'\'/'\'%' or '+'\'-'
with precedence in that order and equal precedence between alternatives.

When it reaches the (POW INT)? alternative, the parser looks-ahead by 'k'
tokens to determine if POW, MUL, DIV, MOD, PLUS or MINUS is matched and then
consumes more tokens to match the input to the anticipated tokens in that
rule. Hence, for these rules you only need k=1 to determine which
alternative is successful.

By structuring the rules in this way, it guarantees that a POW will always
be matched before a MUL/DIV/MOD, and those before a PLUS/MINUS. Hence
precedence is correctly dealt with. Left and right associativeness depends
only on your actions or AST construction - you can process in either
direction but the rules given there will build a right-associative AST by
default.

> 
> What's also confusing is that the structure of the grammar above
> determines
> the operator precedence.  With yacc operator precedence is specified more
> clearly via special keywords.  Again perhaps I only think this way because
> I
> learned it differently the first time.
> 

Well this seems okay to me, but then I've never used yacc. Is it really
*that* much easier?

Nigel

--
Nigel Sheridan-Smith
PhD research student

Faculty of Engineering
University of Technology, Sydney
Phone: 02 9514 7946
Fax: 02 9514 2435