[antlr-interest] Tree grammar generated for the C runtime reports a syntax error where Java doesn't

Jim Idle jimi at temporal-wave.com
Sun Jun 24 21:00:00 PDT 2012


This means that your tree grammar does not reflect the tree that you are
building in some way. The only differences in behavior I have seen in this
kind of thing is the known bug you mention and that sometimes because of the
lack of exceptions, there needs to be an extra top level rule to make sure
that errors that occur at the top node in the tree are reported (this can
also apply to some parser grammars). Also, I think that the treatment of
wildcards in the C runtime is no longer quite the same as in the Java
version.

In such cases as this, I use the debugger and just follow the C code. It
would seem that as your input causes the error to be thrown very early in
the tree walk that this should be easy to track down. But before doing that,
I would produce the .dot file for a failing tree and then use graphviz to
compare it to the tree produced by Java to make sure that they are the same.
See many past emails for instructions on doing this (antlr.markmail.org)

Jim

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Martin Potthast
Sent: Sunday, June 24, 2012 10:17 PM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Tree grammar generated for the C runtime reports a
syntax error where Java doesn't

Dear everyone,

I am currently developing a simple grammar for a regex-like language that
involves a tree grammar. When I debug the tree grammar in Java using
ANTLRworks it seems to work fine. However, once I generate C code, some
inputs would fail reporting the syntax error:
    -unknown source-(0)  : error 1 : Unexpected node, at offset 0, near DOWN
: syntax error...

This might hint at a bug in the C runtime, though I'm not entirely sure
about that. Anyway, since I'm at my wits end about this, I was wondering
whether one of you can help me.

Attached you will find the grammars, their generated C code, and a text rig.
I am using the latest stable release ANTLRworks 1.4.3 and the latest C
runtime libantlr3c-3.4.tar.gz, compiled with the 64 bit flag.

Clues:
- The offending input is as simple as "[a]" (excluding the quotes).
- The input "[?]" works, the only difference being that the question mark is
a single node in the tree grammar whereas other possibilities may have an
arbitrary number of sub-trees, as indicated by the wildcards.
- The grammar distinguishes between bracketed expressions that involve a
whitespace and those that don't. Again, the input "[a b]" fails in C, but
not in Java.

I'd be very happy if anyone would help me.

Martin


PS: On a minor note, when regenerating the C code from the grammars you will
notice that RegexWord.c won't compile because one function contains a line
referring to a variable "stream_" that should be "stream_unit2" in my case.
This is a known bug, and after changing the variable name as indicated, the
generated code compiles.


--
Martin Potthast
Bauhaus-Universität Weimar
www.webis.de  ---  www.netspeak.org


More information about the antlr-interest mailing list