[antlr-interest] rules for re-factoring grammars?
Gavin Lambert
antlr at mirality.co.nz
Fri Mar 21 18:22:02 PDT 2008
At 13:12 22/03/2008, Benjamin Shropshire wrote:
>then parse an H out of "FBCG" (assuming the needed actions to
>rebuild the left recursive tree)
>
>old: FBCG -> h(a(F), i(B), i(C),G);
>new: FBCG -> h(a(a(a(F),B),C),G);
>
>Am I missing something here?
Well, I'm not sure why you seem to think the "new" parse tree is a
product of removing the left recursion. The original recursive
rule you specified will produce the "new" tree just as easily as
the new rule (actually more easily, since that's not really the
way the new rule works). The only difference in fact that I can
see between the two trees is whether they're matching greedily or
not. (The "old" tree is non-greedy while the "new" tree is
greedy. And ANTLR always matches greedily, so both rules should
have netted you the "new" tree, assuming ANTLR didn't get stuck on
the left recursion.)
But yes, generally speaking any change in the grammar can lead to
a change in the parse tree, even if it's not changing the input
language structure. This is why a parse tree is usually not an
especially useful output (except possibly for initial debugging),
and why you should either use direct actions or output an AST
instead. This is because the parse tree directly reflects the
grammar structure, which is not necessarily a good match to the
language structure.
Ignoring those new rules for the moment:
old : FBCD -> a(a(a(F), B), C), a(D);
new : FBCD -> a(F, B, C), a(D);
That's the *real* difference in the parse tree between the two --
the first requires recursive invocation while the second does
not. They still match the same language though.
>H : A I* G;
>
>I : B | C;
If you do add those new rules in, then you've introduced an
ambiguity that can only be resolved by looking ahead far enough to
see if there's a G or not. That's got nothing to do with whether
there's left recursion or not.
But I'm not really sure why you'd want to introduce I anyway,
since "A I*" is exactly covered by "A" -- the matching input sets
for the two rules are completely identical. In other words, it's
not just ambiguous, it's actually redundant.
More information about the antlr-interest
mailing list