[antlr-interest] lookahead DFA too big?

Wed Mar 4 12:30:34 PST 2009

Hi,

hi I also had this problem when trying to introduce a backward 
compatible change into our Lexer. The problem with code too large... It 
aggravated the heck out of me. Whenever I found a solution that did not 
break the Java code size limit I got something which had here and there 
less functionality than needed. I ended up giving up on the project and 
arguing that it is not possible because it would break current syntax 
(to my luck this is actually true).

Anyway I never got around this problem and I still find it very 
aggravating. I am not too exicited about the look-ahead mechanism 
because if you have to look-ahead several charactes then make decision 
you have to go back in the stream and then rematch those characters... 
Also I tried to port a hand-written parser/lexer to ANTLR which used a 
state table and some context sensitive decisions and was not able too 
port this. First I ran into the code too large exceptions and then into 
unresolvable amigiousties. Also semantic predicates do not help too much 
because ANTLR still uses look-ahead in the presence of semantic 
predicates - I would like to switch it off - how?? - if I use semantic 
predicates I know what I am doing and I don't need ANTLR's automatic 
analysis.

Sorry for the rant. This is certainly not motivating for the developers 
of which I am one as well for a different project. So let me say ANTLR 
is a great tool, I really like ANTLRWorks debugging functionalities, and 
ANTLR allows to produce parsers who meaning is explicitly expressed via 
the grammar rather than implicitly hidden in hand-written code. Also the 
automatic (context-sensitive) FOLLOW-set generation is very helpful!

As for you, maybe you can try to make your grammar ambigious and turn on 
backtracking and memorization (for backtracking optimization). Of course 
a backtracking grammar is slower than an LL(k)/* grammar, but if you get 
what you want....

BR,
Paul

Andreas Meyer schrieb:
> Hi!
>
> Does anybody know what are the factors that influence the size of the 
> lookahead DFA? I read the ANTLR book, which is great. However, I found 
> no details on how the LL* implementation actually computes these DFA's 
> and how to avoid the DFA explosion that I encountered (is there a paper 
> describing the ANTLR implementation of LL(*), btw?). I worked hard on my 
> abap grammar, but I cannot figure out where this error comes from:
>
> $ java -Xmx1g org.antlr.Tool ABAP4.g
> warning(205): ABAP4.g:4468:2: ANTLR could not analyze this decision in 
> rule atom; often this is because of recursive rule references visible 
> from the left edge of alternatives.  ANTLR will r
> e-analyze the decision with a fixed lookahead of k=1.  Consider using 
> "options {k=1;}" for that decision and possibly adding a syntactic 
> predicate.
> error(10):  internal error: 
> org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1242): could not 
> even do k=1 for decision 300; reason: timed out (>1000ms)
>
> This rule does not use recursion/loops/options etc, _but_ it contains a 
> reference to a rule with 800 alternative tokens. When used in a simple 
> hello-world grammar, things are fine and I can successfully parse a 
> subset of ABAP. However, when I keep adding rules to the grammar, that 
> _use_ atom, but that do not extend anything reachable from atom, after a 
> certain threshold I get this error message. Lots of references to "atom" 
> are at the end of rules, with +,*,? modifiers and the follow-sets of 
> these rules (the rules that use  atom) often contain many of the 800 
> keyword tokens, so there would be an ambiguity. I (hopefully) resolved 
> every ambiguity, by using semantic predicates : at least I do not get 
> any error message, only the one above .... I have a feeling that the 
> error might still have to do with the ambiguities between (loops of) 
> identifiers and the keywords following the enclosing rules. My question 
> would be: why does ANTLR complain about a timeout in rule "atom"? Only a 
> decision among the identifiier / 800 keywords is done there, not the 
> ambiguity resolution. My intuition tells me that this should be done at 
> the "calling place", not inside "atom".
>
> Note that the error goes away with -Xconversiontimeout 20000 (on a 
> recent 2.x GHz Core2Duo), but the generated Java code is rather large, 
> exceeding the JVM's 64k limit in many different ways. However, I dont 
> quite see why the DFAs should be so big? I hope there is someone on the 
> list who has seen a similar problem. Unfortunately, I cannot post a nice 
> and small grammar here: my smallest error-producing grammar is about 
> 5000 lines long.
>
> Thanks a lot for any help!
> Andreas Meyer
> (smartshift.de)
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>