[antlr-interest] lookahead DFA too big?

Thu Mar 5 09:55:16 PST 2009

Jim Idle schrieb:
> Andreas Meyer wrote:
>> Maybe it's possible to partition the set of keywords, but that would 
>> be some effort: figuring out for 800 keywords, where they appear, 
>> what is the context they are used in etc. Note that the problem only 
>> appeared after switching to ANTLR 3.1, ANTLR 2.7 was fine with it and 
>> the generated parser works well.
>>   
>
> Andreas,
>
> What language are you trying to parse? 

It is SAP ABAP.

> I think that you have taken the wrong approach perhaps but it is a 
> little difficult to advise without knowing what you are trying to 
> parse. Basically, what Loring is saying is that you need to step back 
> a bit. Unfortunately, when there are this many keywords and they can 
> all be identifiers too, there will have to be some classification, 
> just as in English there are verbs, nouns, and so on.
>
> Perhaps what you need is something like a set of explicit tokens for 
> hard and fast keywords, then check ID at various other points. A 
> hybrid approach in other words. Is the current grammar something you 
> can email off-line rather than to the whole group? 

This is a bit tricky, the problem occurs only after adding up at lot of 
rules and the smallest error-inducing grammar (derived from the initial 
grammar by leaving out rules here and there) is 5000 lines long. I 
cannot expect anybody to look at this in detail :-)

> I suspect that the original parser for this language probably used a 
> scannerless parser, or rather, a hand crafted program that was neither 
> scanner nor parser in a 'real' sense.

It used (and uses) a chain of token sources, starting with an ANTLR 
generated lexer for most of the lexing work and some handwritten lexer 
adapters that manipulate the stream of tokens.

>> On some power-point presentation from Terence Parr, I have seen a 
>> slide mentioning huge generated DFAs. Is there some additional 
>> material available that documents some of the possible situations 
>> that make the DFAs explode?
>>   
> NFA and DFA are wider concepts than ANTLR of course - you will find 
> plenty of literature on this. As to ANTLR itself, then Ter is always 
> willing to look at fixes/optimizations/etc but we need to provide some 
> sort of concrete example. In many cases it is a matter of design and 
> not bugs. When the DFA explodes because of dangling optional clauses 
> in say a statement, it is often because there is no definitive end to 
> a statement, such as Newline or ';' and so on.
>
> With a look at your grammar, I can probably help you.

I will try to come up with a minimal example that reproduces the behaviour.

Thanks a lot for your patience!
Andreas Meyer