[antlr-interest] Tree grammar generated for the C runtime reports a syntax error where Java doesn't
Martin Potthast
martin.potthast at uni-weimar.de
Mon Jul 9 14:41:28 PDT 2012
Following up on this issue, I have further narrowed down the problem
with wildcards not matching entire sub-trees in a tree parser: it
turns out, the sub-tree is copied alright, but the token stream
reader/matcher is not advanced correspondingly. This can be easily
fixed, as I will show below.
Consider the following example rule:
foo : . ;
This rule should match every tree, including all of its sub-trees.
The C code generated for this rule looks like this:
[...]
{
_last = (pANTLR3_BASE_TREE)LT(1);
wildcard1=(pANTLR3_BASE_TREE)LT(1);
MATCHANYT();
if (HASEXCEPTION())
{
goto ruleanyforestEx;
}
wildcard1_tree = (pANTLR3_BASE_TREE)ADAPTOR->dupTree(ADAPTOR, wildcard1);
ADAPTOR->addChild(ADAPTOR, root_0, wildcard1_tree);
}
[...]
As you can see, wildcard1_tree is copied to the ADAPTOR using ADAPTOR->dupTree.
Observe, however, that MATCHANYT() does not mean "match any tree", but
"match any token". Therefore, the stream reader is not advanced in
accordance with copying the entire sub-tree to the ADAPTOR. This
leaves the tree parser in a corrupt state.
Here's a workaround that solves the problem when patching the tree
parser manually:
[...]
{
_last = (pANTLR3_BASE_TREE)LT(1);
wildcard1=(pANTLR3_BASE_TREE)LT(1);
int depth = 0;
for (;;)
{
MATCHANYT();
switch ( LA(1) )
{
case DOWN:
depth += 1;
break;
case UP:
depth -= 1;
break;
}
if (depth == 0) /* sub tree finished */
{
MATCHT(ANTLR3_TOKEN_UP, NULL);
break;
}
if (depth == -1) /* no sub tree found */
{
break;
}
}
if (HASEXCEPTION())
{
goto ruleanyforestEx;
}
wildcard1_tree = (pANTLR3_BASE_TREE)ADAPTOR->dupTree(ADAPTOR, wildcard1);
ADAPTOR->addChild(ADAPTOR, root_0, wildcard1_tree);
}
[...]
Above, the for loop around MATCHANYT() traverses the parse tree,
matching any token on the way. Traversal stops when no sub-tree is
found on a given node, and when the number of UP nodes matched is for
the first time the same as the number of DOWN nodes matched
beforehand.
Is this a good solution?
Does it merit being added to the parser generator?
Martin
--
Martin Potthast
Bauhaus-Universität Weimar
www.webis.de --- www.netspeak.org
More information about the antlr-interest
mailing list