[antlr-interest] lookahead DFA too big?

Wed Mar 4 13:40:42 PST 2009

800 token types is a staggeringly large number and indicates that you took the wrong path in dealing with the keyword versus identifier problem.  In the cases where I have had this many "keywords", they can usually be decomposed into subcategories and that information kept in a symbol table for use with sempreds.  With this many token types, you want to differentiate locally (use sempreds to recognize keywords where they should appear), not globally (all keywords should be recognized as "IDENTIFIER" in the lexer).

--Loring

----- Original Message ----
> From: Andreas Meyer <andreas.meyer at smartshift.de>
> To: antlr-interest at antlr.org
> Sent: Wednesday, March 4, 2009 11:53:07 AM
> Subject: [antlr-interest] lookahead DFA too big?
> 
> Hi!
> 
> Does anybody know what are the factors that influence the size of the 
> lookahead DFA? I read the ANTLR book, which is great. However, I found 
> no details on how the LL* implementation actually computes these DFA's 
> and how to avoid the DFA explosion that I encountered (is there a paper 
> describing the ANTLR implementation of LL(*), btw?). I worked hard on my 
> abap grammar, but I cannot figure out where this error comes from:
> 
> $ java -Xmx1g org.antlr.Tool ABAP4.g
> warning(205): ABAP4.g:4468:2: ANTLR could not analyze this decision in 
> rule atom; often this is because of recursive rule references visible 
> from the left edge of alternatives.  ANTLR will r
> e-analyze the decision with a fixed lookahead of k=1.  Consider using 
> "options {k=1;}" for that decision and possibly adding a syntactic 
> predicate.
> error(10):  internal error: 
> org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1242): could not 
> even do k=1 for decision 300; reason: timed out (>1000ms)
> 
> This rule does not use recursion/loops/options etc, _but_ it contains a 
> reference to a rule with 800 alternative tokens. When used in a simple 
> hello-world grammar, things are fine and I can successfully parse a 
> subset of ABAP. However, when I keep adding rules to the grammar, that 
> _use_ atom, but that do not extend anything reachable from atom, after a 
> certain threshold I get this error message. Lots of references to "atom" 
> are at the end of rules, with +,*,? modifiers and the follow-sets of 
> these rules (the rules that use  atom) often contain many of the 800 
> keyword tokens, so there would be an ambiguity. I (hopefully) resolved 
> every ambiguity, by using semantic predicates : at least I do not get 
> any error message, only the one above .... I have a feeling that the 
> error might still have to do with the ambiguities between (loops of) 
> identifiers and the keywords following the enclosing rules. My question 
> would be: why does ANTLR complain about a timeout in rule "atom"? Only a 
> decision among the identifiier / 800 keywords is done there, not the 
> ambiguity resolution. My intuition tells me that this should be done at 
> the "calling place", not inside "atom".
> 
> Note that the error goes away with -Xconversiontimeout 20000 (on a 
> recent 2.x GHz Core2Duo), but the generated Java code is rather large, 
> exceeding the JVM's 64k limit in many different ways. However, I dont 
> quite see why the DFAs should be so big? I hope there is someone on the 
> list who has seen a similar problem. Unfortunately, I cannot post a nice 
> and small grammar here: my smallest error-producing grammar is about 
> 5000 lines long.
> 
> Thanks a lot for any help!
> Andreas Meyer
> (smartshift.de)
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address