[antlr-interest] left-recursive expression rule prototype

Sat Feb 19 18:21:31 PST 2011

howdy. Finally got around to supporting left-recursion in limited circumstances. I started looking at it 2008:

http://www.antlr.org/wiki/display/~admin/2008/03/23/Faster+expression+parsing+for+ANTLR

Specifically, the building of perhaps 20 rules to handle the different precedence levels of arithmetic expressions and top-down parsers is heinous. The LR bottom up version is much easier to read but is, of course is ambiguous. I'm working on a little prototype where you specify an LR-like rule for arithmetic expressions and ANTLR can translate it to a "precedence climbing" rule without left recursion. It uses a predicate to simply compare the precedence of the previous operator with the precedence of the next operator coming down the road. The order of the alternatives provides the precedence of operations from loosest to tightest binding.

At least for now, I'm distinguishing rules with left recursion you want removed by using '/' instead of '|' for the alternative operator. In principle, I can simply look for patterns in any rule (works great for declarations too) and avoid the special operator.

Anyway, here is my test cases:

grammar V;
options {output=AST;}

e : e ('+'^|'-'^) e
  / e '*'^ e
  / '-'^ e
  / e '.'^ ID
  / INT
  / ID
  ;

ID : 'a'..'z'+ ;
INT : '0'..'9'+ ;
WS : (' '|'\t'|'\n')+ {skip();} ;

it gets translated to

start: e : e_[0] ;
e_[int _p]
   :   e_primary
       ( {1 >= _p}?=>  ('+'^|'-'^) e_[2]
         | {2 >= _p}?=>  '*'^ e_[3]
         | {4 >= _p}?=> '.'^ ID
       )*
   ;
e_primary
    :    '-'^ e_[3]
	| INT
	| ID
    ;

here are some sample input-output pairs using my test rig; input text to trees:

3+a.b 		(+ 3 (. a b))
a.b+3		(+ (. a b) 3)
a.b+3*4		(+ (. a b) (* 3 4))
-b * c		(* (- b) c)
a + -b * c		(+ a (* (- b) c))

I have to  get the  left vs right associativity in still, but I am encouraged by these early results. This translation results in a massive space reduction in the generated parsers over the typical 20-level chain of rules for arithmetic expression parsing. It should also prove incredibly fast in comparison. For example, the standard solution requires 20 method calls to match an integer vs 1 or 2 here.

And, oh yea, it will actually work for C declarators too :)

declarator
        : '*' declarator
        / declarator '[' e ']'
        / declarator '[' ']' 
        / declarator '(' ')'
        / '(' declarator ')'
        / ID
        ; 

ter