[antlr-interest] lookahead DFA too big?
Paul Bouché
paul.bouche at apertio.com
Wed Mar 4 12:30:34 PST 2009
Hi,
hi I also had this problem when trying to introduce a backward
compatible change into our Lexer. The problem with code too large... It
aggravated the heck out of me. Whenever I found a solution that did not
break the Java code size limit I got something which had here and there
less functionality than needed. I ended up giving up on the project and
arguing that it is not possible because it would break current syntax
(to my luck this is actually true).
Anyway I never got around this problem and I still find it very
aggravating. I am not too exicited about the look-ahead mechanism
because if you have to look-ahead several charactes then make decision
you have to go back in the stream and then rematch those characters...
Also I tried to port a hand-written parser/lexer to ANTLR which used a
state table and some context sensitive decisions and was not able too
port this. First I ran into the code too large exceptions and then into
unresolvable amigiousties. Also semantic predicates do not help too much
because ANTLR still uses look-ahead in the presence of semantic
predicates - I would like to switch it off - how?? - if I use semantic
predicates I know what I am doing and I don't need ANTLR's automatic
analysis.
Sorry for the rant. This is certainly not motivating for the developers
of which I am one as well for a different project. So let me say ANTLR
is a great tool, I really like ANTLRWorks debugging functionalities, and
ANTLR allows to produce parsers who meaning is explicitly expressed via
the grammar rather than implicitly hidden in hand-written code. Also the
automatic (context-sensitive) FOLLOW-set generation is very helpful!
As for you, maybe you can try to make your grammar ambigious and turn on
backtracking and memorization (for backtracking optimization). Of course
a backtracking grammar is slower than an LL(k)/* grammar, but if you get
what you want....
BR,
Paul
Andreas Meyer schrieb:
> Hi!
>
> Does anybody know what are the factors that influence the size of the
> lookahead DFA? I read the ANTLR book, which is great. However, I found
> no details on how the LL* implementation actually computes these DFA's
> and how to avoid the DFA explosion that I encountered (is there a paper
> describing the ANTLR implementation of LL(*), btw?). I worked hard on my
> abap grammar, but I cannot figure out where this error comes from:
>
> $ java -Xmx1g org.antlr.Tool ABAP4.g
> warning(205): ABAP4.g:4468:2: ANTLR could not analyze this decision in
> rule atom; often this is because of recursive rule references visible
> from the left edge of alternatives. ANTLR will r
> e-analyze the decision with a fixed lookahead of k=1. Consider using
> "options {k=1;}" for that decision and possibly adding a syntactic
> predicate.
> error(10): internal error:
> org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1242): could not
> even do k=1 for decision 300; reason: timed out (>1000ms)
>
> This rule does not use recursion/loops/options etc, _but_ it contains a
> reference to a rule with 800 alternative tokens. When used in a simple
> hello-world grammar, things are fine and I can successfully parse a
> subset of ABAP. However, when I keep adding rules to the grammar, that
> _use_ atom, but that do not extend anything reachable from atom, after a
> certain threshold I get this error message. Lots of references to "atom"
> are at the end of rules, with +,*,? modifiers and the follow-sets of
> these rules (the rules that use atom) often contain many of the 800
> keyword tokens, so there would be an ambiguity. I (hopefully) resolved
> every ambiguity, by using semantic predicates : at least I do not get
> any error message, only the one above .... I have a feeling that the
> error might still have to do with the ambiguities between (loops of)
> identifiers and the keywords following the enclosing rules. My question
> would be: why does ANTLR complain about a timeout in rule "atom"? Only a
> decision among the identifiier / 800 keywords is done there, not the
> ambiguity resolution. My intuition tells me that this should be done at
> the "calling place", not inside "atom".
>
> Note that the error goes away with -Xconversiontimeout 20000 (on a
> recent 2.x GHz Core2Duo), but the generated Java code is rather large,
> exceeding the JVM's 64k limit in many different ways. However, I dont
> quite see why the DFAs should be so big? I hope there is someone on the
> list who has seen a similar problem. Unfortunately, I cannot post a nice
> and small grammar here: my smallest error-producing grammar is about
> 5000 lines long.
>
> Thanks a lot for any help!
> Andreas Meyer
> (smartshift.de)
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list