[antlr-interest] lookahead DFA too big?

Jim Idle jimi at temporal-wave.com
Thu Mar 5 07:27:20 PST 2009


Andreas Meyer wrote:
> Maybe it's possible to partition the set of keywords, but that would be 
> some effort: figuring out for 800 keywords, where they appear, what is 
> the context they are used in etc. Note that the problem only appeared 
> after switching to ANTLR 3.1, ANTLR 2.7 was fine with it and the 
> generated parser works well.
>   

Andreas,

What language are you trying to parse? I think that you have taken the 
wrong approach perhaps but it is a little difficult to advise without 
knowing what you are trying to parse. Basically, what Loring is saying 
is that you need to step back a bit. Unfortunately, when there are this 
many keywords and they can all be identifiers too, there will have to be 
some classification, just as in English there are verbs, nouns, and so on.

Perhaps what you need is something like a set of explicit tokens for 
hard and fast keywords, then check ID at various other points. A hybrid 
approach in other words. Is the current grammar something you can email 
off-line rather than to the whole group? I suspect that the original 
parser for this language probably used a scannerless parser, or rather, 
a hand crafted program that was neither scanner nor parser in a 'real' 
sense.

> On some power-point presentation from Terence Parr, I have seen a slide 
> mentioning huge generated DFAs. Is there some additional material 
> available that documents some of the possible situations that make the 
> DFAs explode?
>   
NFA and DFA are wider concepts than ANTLR of course - you will find 
plenty of literature on this. As to ANTLR itself, then Ter is always 
willing to look at fixes/optimizations/etc but we need to provide some 
sort of concrete example. In many cases it is a matter of design and not 
bugs. When the DFA explodes because of dangling optional clauses in say 
a statement, it is often because there is no definitive end to a 
statement, such as Newline or ';' and so on.

With a look at your grammar, I can probably help you.


Jim


More information about the antlr-interest mailing list