[antlr-interest] rules for re-factoring grammars?

Gavin Lambert antlr at mirality.co.nz
Fri Mar 21 18:22:02 PDT 2008


At 13:12 22/03/2008, Benjamin Shropshire wrote:
 >then parse an H out of "FBCG" (assuming the needed actions to
 >rebuild the left recursive tree)
 >
 >old:    FBCG -> h(a(F), i(B), i(C),G);
 >new:   FBCG -> h(a(a(a(F),B),C),G);
 >
 >Am I missing something here?

Well, I'm not sure why you seem to think the "new" parse tree is a 
product of removing the left recursion.  The original recursive 
rule you specified will produce the "new" tree just as easily as 
the new rule (actually more easily, since that's not really the 
way the new rule works).  The only difference in fact that I can 
see between the two trees is whether they're matching greedily or 
not.  (The "old" tree is non-greedy while the "new" tree is 
greedy.  And ANTLR always matches greedily, so both rules should 
have netted you the "new" tree, assuming ANTLR didn't get stuck on 
the left recursion.)

But yes, generally speaking any change in the grammar can lead to 
a change in the parse tree, even if it's not changing the input 
language structure.  This is why a parse tree is usually not an 
especially useful output (except possibly for initial debugging), 
and why you should either use direct actions or output an AST 
instead.  This is because the parse tree directly reflects the 
grammar structure, which is not necessarily a good match to the 
language structure.

Ignoring those new rules for the moment:

old : FBCD -> a(a(a(F), B), C), a(D);
new : FBCD -> a(F, B, C), a(D);

That's the *real* difference in the parse tree between the two -- 
the first requires recursive invocation while the second does 
not.  They still match the same language though.

 >H : A I* G;
 >
 >I : B | C;

If you do add those new rules in, then you've introduced an 
ambiguity that can only be resolved by looking ahead far enough to 
see if there's a G or not.  That's got nothing to do with whether 
there's left recursion or not.

But I'm not really sure why you'd want to introduce I anyway, 
since "A I*" is exactly covered by "A" -- the matching input sets 
for the two rules are completely identical.  In other words, it's 
not just ambiguous, it's actually redundant.



More information about the antlr-interest mailing list