[antlr-interest] lookahead DFA too big?

Wed Mar 4 11:53:07 PST 2009

Hi!

Does anybody know what are the factors that influence the size of the 
lookahead DFA? I read the ANTLR book, which is great. However, I found 
no details on how the LL* implementation actually computes these DFA's 
and how to avoid the DFA explosion that I encountered (is there a paper 
describing the ANTLR implementation of LL(*), btw?). I worked hard on my 
abap grammar, but I cannot figure out where this error comes from:

$ java -Xmx1g org.antlr.Tool ABAP4.g
warning(205): ABAP4.g:4468:2: ANTLR could not analyze this decision in 
rule atom; often this is because of recursive rule references visible 
from the left edge of alternatives.  ANTLR will r
e-analyze the decision with a fixed lookahead of k=1.  Consider using 
"options {k=1;}" for that decision and possibly adding a syntactic 
predicate.
error(10):  internal error: 
org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1242): could not 
even do k=1 for decision 300; reason: timed out (>1000ms)

This rule does not use recursion/loops/options etc, _but_ it contains a 
reference to a rule with 800 alternative tokens. When used in a simple 
hello-world grammar, things are fine and I can successfully parse a 
subset of ABAP. However, when I keep adding rules to the grammar, that 
_use_ atom, but that do not extend anything reachable from atom, after a 
certain threshold I get this error message. Lots of references to "atom" 
are at the end of rules, with +,*,? modifiers and the follow-sets of 
these rules (the rules that use  atom) often contain many of the 800 
keyword tokens, so there would be an ambiguity. I (hopefully) resolved 
every ambiguity, by using semantic predicates : at least I do not get 
any error message, only the one above .... I have a feeling that the 
error might still have to do with the ambiguities between (loops of) 
identifiers and the keywords following the enclosing rules. My question 
would be: why does ANTLR complain about a timeout in rule "atom"? Only a 
decision among the identifiier / 800 keywords is done there, not the 
ambiguity resolution. My intuition tells me that this should be done at 
the "calling place", not inside "atom".

Note that the error goes away with -Xconversiontimeout 20000 (on a 
recent 2.x GHz Core2Duo), but the generated Java code is rather large, 
exceeding the JVM's 64k limit in many different ways. However, I dont 
quite see why the DFAs should be so big? I hope there is someone on the 
list who has seen a similar problem. Unfortunately, I cannot post a nice 
and small grammar here: my smallest error-producing grammar is about 
5000 lines long.

Thanks a lot for any help!
Andreas Meyer
(smartshift.de)