[antlr-interest] Tree grammar generated for the C runtime reports a syntax error where Java doesn't

Martin Potthast martin.potthast at uni-weimar.de
Tue Jul 3 12:08:03 PDT 2012

Hi Jim,

thanks for your quick reply, and sorry for my being late.

I proceeded as you recommended and found that the parser output of C
and Java is equivalent for the offending input, so the parser is not
to blame. Upon debugging the code step by step I found that the tree
parser descends alright into the AST from the parser. The error occurs
when it finds a DOWN node where it expects an UP node while processing
the wildcard portion of a rule.

Apparently, for the C runtime, a wildcard does not represent a tree
but only a single node.
(I remember finding a discussion about this somewhere else on this
list, where I believe the resolution was that it should represent a
tree, right?)

My question now is this: How to specify a rule that accepts any
sequence of sub-trees on a given node?

For example:

^(MY_NODE .*)

Here, the . shall represent a sub-tree of arbitrary depth, and the
star shall indicate that there may be any number of sub-trees on
MY_NODE. Is there a way to specify this behavior in a tree grammar
without having to spell out all possible sub-trees explicitly?

Thanks again for your help!


On Mon, Jun 25, 2012 at 6:00 AM, Jim Idle <jimi at temporal-wave.com> wrote:
> This means that your tree grammar does not reflect the tree that you are
> building in some way. The only differences in behavior I have seen in this
> kind of thing is the known bug you mention and that sometimes because of the
> lack of exceptions, there needs to be an extra top level rule to make sure
> that errors that occur at the top node in the tree are reported (this can
> also apply to some parser grammars). Also, I think that the treatment of
> wildcards in the C runtime is no longer quite the same as in the Java
> version.
> In such cases as this, I use the debugger and just follow the C code. It
> would seem that as your input causes the error to be thrown very early in
> the tree walk that this should be easy to track down. But before doing that,
> I would produce the .dot file for a failing tree and then use graphviz to
> compare it to the tree produced by Java to make sure that they are the same.
> See many past emails for instructions on doing this (antlr.markmail.org)
> Jim
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Martin Potthast
> Sent: Sunday, June 24, 2012 10:17 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Tree grammar generated for the C runtime reports a
> syntax error where Java doesn't
> Dear everyone,
> I am currently developing a simple grammar for a regex-like language that
> involves a tree grammar. When I debug the tree grammar in Java using
> ANTLRworks it seems to work fine. However, once I generate C code, some
> inputs would fail reporting the syntax error:
>     -unknown source-(0)  : error 1 : Unexpected node, at offset 0, near DOWN
> : syntax error...
> This might hint at a bug in the C runtime, though I'm not entirely sure
> about that. Anyway, since I'm at my wits end about this, I was wondering
> whether one of you can help me.
> Attached you will find the grammars, their generated C code, and a text rig.
> I am using the latest stable release ANTLRworks 1.4.3 and the latest C
> runtime libantlr3c-3.4.tar.gz, compiled with the 64 bit flag.
> Clues:
> - The offending input is as simple as "[a]" (excluding the quotes).
> - The input "[?]" works, the only difference being that the question mark is
> a single node in the tree grammar whereas other possibilities may have an
> arbitrary number of sub-trees, as indicated by the wildcards.
> - The grammar distinguishes between bracketed expressions that involve a
> whitespace and those that don't. Again, the input "[a b]" fails in C, but
> not in Java.
> I'd be very happy if anyone would help me.
> Martin
> PS: On a minor note, when regenerating the C code from the grammars you will
> notice that RegexWord.c won't compile because one function contains a line
> referring to a variable "stream_" that should be "stream_unit2" in my case.
> This is a known bug, and after changing the variable name as indicated, the
> generated code compiles.
> --
> Martin Potthast
> Bauhaus-Universität Weimar
> www.webis.de  ---  www.netspeak.org
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

Martin Potthast
Bauhaus-Universität Weimar
www.webis.de  ---  www.netspeak.org

More information about the antlr-interest mailing list