[antlr-interest] lookahead DFA too big?
Andreas Meyer
andreas.meyer at smartshift.de
Wed Mar 4 11:53:07 PST 2009
Hi!
Does anybody know what are the factors that influence the size of the
lookahead DFA? I read the ANTLR book, which is great. However, I found
no details on how the LL* implementation actually computes these DFA's
and how to avoid the DFA explosion that I encountered (is there a paper
describing the ANTLR implementation of LL(*), btw?). I worked hard on my
abap grammar, but I cannot figure out where this error comes from:
$ java -Xmx1g org.antlr.Tool ABAP4.g
warning(205): ABAP4.g:4468:2: ANTLR could not analyze this decision in
rule atom; often this is because of recursive rule references visible
from the left edge of alternatives. ANTLR will r
e-analyze the decision with a fixed lookahead of k=1. Consider using
"options {k=1;}" for that decision and possibly adding a syntactic
predicate.
error(10): internal error:
org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1242): could not
even do k=1 for decision 300; reason: timed out (>1000ms)
This rule does not use recursion/loops/options etc, _but_ it contains a
reference to a rule with 800 alternative tokens. When used in a simple
hello-world grammar, things are fine and I can successfully parse a
subset of ABAP. However, when I keep adding rules to the grammar, that
_use_ atom, but that do not extend anything reachable from atom, after a
certain threshold I get this error message. Lots of references to "atom"
are at the end of rules, with +,*,? modifiers and the follow-sets of
these rules (the rules that use atom) often contain many of the 800
keyword tokens, so there would be an ambiguity. I (hopefully) resolved
every ambiguity, by using semantic predicates : at least I do not get
any error message, only the one above .... I have a feeling that the
error might still have to do with the ambiguities between (loops of)
identifiers and the keywords following the enclosing rules. My question
would be: why does ANTLR complain about a timeout in rule "atom"? Only a
decision among the identifiier / 800 keywords is done there, not the
ambiguity resolution. My intuition tells me that this should be done at
the "calling place", not inside "atom".
Note that the error goes away with -Xconversiontimeout 20000 (on a
recent 2.x GHz Core2Duo), but the generated Java code is rather large,
exceeding the JVM's 64k limit in many different ways. However, I dont
quite see why the DFAs should be so big? I hope there is someone on the
list who has seen a similar problem. Unfortunately, I cannot post a nice
and small grammar here: my smallest error-producing grammar is about
5000 lines long.
Thanks a lot for any help!
Andreas Meyer
(smartshift.de)
More information about the antlr-interest
mailing list