[antlr-interest] [antlr-dev] C and C++ targets in ANTLR

Wed Apr 4 11:56:37 PDT 2012

What is happening is that your grammar is badly organized, at least in some
areas and this is causing the production of some massive DFAs. In order to
generate code that is more efficient in C, the defaults for tables vs
switch statements and related settings are different for C than for the
other targets. It is this that is causing ANTLR to fail. Look at the –X
options and specifically set them to the defaults or perhaps slightly
higher than the defaults. The generated code is then so large that it runs
out of memory producing it.

However, this still means that your grammar needs to be tweaked as
something in there is ambiguous enough to cause a huge DFA, which under the
differing –X defaults of the C target looks like it is not actually just
huge, but never ending. When you generate the java code, look at the
generated code and find out which of the DFAs are really big – then find
out where they are used and that will tell you the rules you need to look
at.

In your grammar, you have a lot of backtrack and memoize options. This is a
clue to the fact that your grammar is in need of some restructure (perhaps
quite a lot), and also will be unable to report errors accurately. The
other place to look might be the expression tree. Take out the backtracking
options one at a time and fix the resulting ambiguities and I bet that the
issue goes away.

Jim

PS: Please use the mailing list rather than emailing me directly as then
there are searchable answers for everyone.

*From:* Mike Lischke [mailto:mike at lischke-online.de]
*Sent:* Wednesday, April 04, 2012 2:09 AM
*To:* Jim Idle
*Subject:* Re: [antlr-dev] C and C++ targets in ANTLR

Hey Jim,

thanks a lot for taking care to answer.

Without the grammar, or a stack trace or some indication of whether you
mean that the 3.4 tool crashes when generating C, there is nothing I can do
to help you here.

You are right, there wasn't much info. I first wanted to check if there's a
simple measure like a setting or log or something like that, that could
help me without posting lots of text. In fact I'm using
the antlr-3.4-complete.jar (OSX that is):

java -jar antlr-3.4-complete.jar MySQL51Parser.g3 -o mysql.parser

no special VM memory assignment, as you can see. I tried 8GB and the jar
happily eats it, but this only delays the stack overflow.

The grammar I used is here: http://www.pasteall.org/30674 and the stack
trace is:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

        at java.util.AbstractList.listIterator(AbstractList.java:315)

        at java.util.AbstractList.listIterator(AbstractList.java:284)

        at org.antlr.misc.IntervalSet.add(IntervalSet.java:107)

        at org.antlr.misc.IntervalSet.add(IntervalSet.java:96)

        at org.antlr.misc.IntervalSet.add(IntervalSet.java:85)

        at org.antlr.misc.IntervalSet.of(IntervalSet.java:70)

        at org.antlr.analysis.Label.getSet(Label.java:220)

        at
org.antlr.codegen.ACyclicDFACodeGenerator.walkFixedDFAGeneratingStateMachine(ACyclicDFACodeGenerator.java:116)

        at
org.antlr.codegen.ACyclicDFACodeGenerator.walkFixedDFAGeneratingStateMachine(ACyclicDFACodeGenerator.java:144)

        at
org.antlr.codegen.ACyclicDFACodeGenerator.walkFixedDFAGeneratingStateMachine(ACyclicDFACodeGenerator.java:144)

        at
org.antlr.codegen.ACyclicDFACodeGenerator.walkFixedDFAGeneratingStateMachine(ACyclicDFACodeGenerator.java:144)

        at
org.antlr.codegen.ACyclicDFACodeGenerator.walkFixedDFAGeneratingStateMachine(ACyclicDFACodeGenerator.java:144)

        at
org.antlr.codegen.ACyclicDFACodeGenerator.walkFixedDFAGeneratingStateMachine(ACyclicDFACodeGenerator.java:144)

        at
org.antlr.codegen.ACyclicDFACodeGenerator.walkFixedDFAGeneratingStateMachine(ACyclicDFACodeGenerator.java:144)

        at
org.antlr.codegen.ACyclicDFACodeGenerator.walkFixedDFAGeneratingStateMachine(ACyclicDFACodeGenerator.java:144)

        at
org.antlr.codegen.ACyclicDFACodeGenerator.genFixedLookaheadDecision(ACyclicDFACodeGenerator.java:47)

        at
org.antlr.codegen.CodeGenerator.genLookaheadDecision(CodeGenerator.java:640)

        at
org.antlr.grammar.v3.CodeGenTreeWalker.block(CodeGenTreeWalker.java:2876)

        at
org.antlr.grammar.v3.CodeGenTreeWalker.rule(CodeGenTreeWalker.java:2382)

        at
org.antlr.grammar.v3.CodeGenTreeWalker.rules(CodeGenTreeWalker.java:1537)

        at
org.antlr.grammar.v3.CodeGenTreeWalker.grammarSpec(CodeGenTreeWalker.java:1441)

        at
org.antlr.grammar.v3.CodeGenTreeWalker.grammar_(CodeGenTreeWalker.java:477)

        at
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:421)

        at org.antlr.Tool.generateRecognizer(Tool.java:655)

        at org.antlr.Tool.process(Tool.java:468)

        at org.antlr.Tool.main(Tool.java:93)

I imagine that there is something wrong with the grammar that happens to be
OK with the other targets and is not in the C target code generating
templates (which were changed between 3.3 and 3.4).

Which is what me leaves me so perplex. How can it crash only for certain
targets, but not in others? The parsing of the grammar is the same, no?

However you will need to experiment commenting out rules until you find the
rule that causes ANTLR to break. Also, did you try 3.3 instead of 3.4?

Yes, that seems to be the only option now ... :-( If there were only a log
that I could scan to see what happened last before things went havoc.

I also tried 3.3 and the same crash happens just with a partially different
stack:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

        at java.util.HashMap.addEntry(HashMap.java:753)

        at java.util.HashMap.put(HashMap.java:385)

        at
org.antlr.stringtemplate.StringTemplate.rawSetAttribute(StringTemplate.java:654)

        at
org.antlr.stringtemplate.StringTemplate.setAttribute(StringTemplate.java:508)

        at
org.antlr.codegen.ACyclicDFACodeGenerator.walkFixedDFAGeneratingStateMachine(ACyclicDFACodeGenerator.java:84)

        at
org.antlr.codegen.ACyclicDFACodeGenerator.walkFixedDFAGeneratingStateMachine(ACyclicDFACodeGenerator.java:144)

...

What would help here to fix this problem is an indication which rule is
currently being parsed, which would narrow down the search.

I am sorry but I can’t work out what you mean by “ANTLR.works gives me
green on the grammar (but doesn't generate the compiler either, however it
also has problems with the java target, so I better leave it out for tasks
other than editing and syntax checking).” In the first paragraph you said
that the Java target works, but here you seem to be saying that it does
not, or perhaps just does not with ANTLR Works?

ANTLR Works is a great tool and I was happy to have it for syntax checks,
refactoring etc. But unfortunately I couldn't generate the parser with it
using the Java target (even tho AW did not show any syntax error). It just
hung while on the command line the generation was done in like 30 secs
(similar to the generation time for java and c# on a Windows box). Also it
just did nothing when I tried to generate the parser when I left the line
192 (http://www.pasteall.org/30674) uncommented. No reaction, no message,
simply nothing. Really took me a while to figure out this line was the
culprit. What adds to the trouble is that once ANTLR Works had this problem
not even fixing the line made it work. I had to restart the tool and could
then run the next attempt. Took me quite some time as you can imagine.
Another problem is that with a larger grammar the UI locks up for a few
seconds with every keypress if there are changes to multiline comments
(which I used to find the offending line). But this is a different story
and I can work around that. My main concern is really to find the problem
in the grammar that makes ANTLR crash.

Thanks again for trying to help!

Mike
-- 
www.soft-gems.net