[antlr-interest] Tree grammar generated for the C runtime reports a syntax error where Java doesn't

Martin Potthast martin.potthast at uni-weimar.de
Mon Jul 9 14:41:28 PDT 2012


Following up on this issue, I have further narrowed down the problem
with wildcards not matching entire sub-trees in a tree parser: it
turns out, the sub-tree is copied alright, but the token stream
reader/matcher is not advanced correspondingly. This can be easily
fixed, as I will show below.

Consider the following example rule:

foo : . ;

This rule should match every tree, including all of its sub-trees.
The C code generated for this rule looks like this:

[...]
{
    _last = (pANTLR3_BASE_TREE)LT(1);
    wildcard1=(pANTLR3_BASE_TREE)LT(1);

    MATCHANYT();
    if  (HASEXCEPTION())
    {
      goto ruleanyforestEx;
    }

    wildcard1_tree = (pANTLR3_BASE_TREE)ADAPTOR->dupTree(ADAPTOR, wildcard1);
    ADAPTOR->addChild(ADAPTOR, root_0, wildcard1_tree);
}
[...]

As you can see, wildcard1_tree is copied to the ADAPTOR using ADAPTOR->dupTree.
Observe, however, that MATCHANYT() does not mean "match any tree", but
"match any token". Therefore, the stream reader is not advanced in
accordance with copying the entire sub-tree to the ADAPTOR. This
leaves the tree parser in a corrupt state.

Here's a workaround that solves the problem when patching the tree
parser manually:

[...]
{
    _last = (pANTLR3_BASE_TREE)LT(1);
    wildcard1=(pANTLR3_BASE_TREE)LT(1);

    int depth = 0;
    for (;;)
    {
        MATCHANYT();
        switch ( LA(1) )
        {
            case DOWN:
                depth += 1;
                break;
            case UP:
                depth -= 1;
                break;
        }
        if (depth == 0) /* sub tree finished */
        {
            MATCHT(ANTLR3_TOKEN_UP, NULL);
            break;
        }
        if (depth == -1) /* no sub tree found */
        {
            break;
        }
    }

    if  (HASEXCEPTION())
    {
        goto ruleanyforestEx;
    }

    wildcard1_tree = (pANTLR3_BASE_TREE)ADAPTOR->dupTree(ADAPTOR, wildcard1);
    ADAPTOR->addChild(ADAPTOR, root_0, wildcard1_tree);
}
[...]

Above, the for loop around MATCHANYT() traverses the parse tree,
matching any token on the way. Traversal stops when no sub-tree is
found on a given node, and when the number of UP nodes matched is for
the first time the same as the number of DOWN nodes matched
beforehand.

Is this a good solution?
Does it merit being added to the parser generator?

Martin

-- 
Martin Potthast
Bauhaus-Universität Weimar
www.webis.de  ---  www.netspeak.org


More information about the antlr-interest mailing list