[antlr-interest] lookahead DFA too big?

Thu Mar 5 02:50:38 PST 2009

Maybe it's possible to partition the set of keywords, but that would be 
some effort: figuring out for 800 keywords, where they appear, what is 
the context they are used in etc. Note that the problem only appeared 
after switching to ANTLR 3.1, ANTLR 2.7 was fine with it and the 
generated parser works well.

Like advertised in the ANTLR book, I also used semantic predicates to 
locally check if an identifier is actually the keyword I want. This did 
not work out: the language is very verbose and most sentences look like 
native english, so the parser then sees : ID ID ID ID and it has to 
check 800 semantic predicates in order to find out which keyword comes 
next. Having only one token (ID) might have been to extreme, and having 
different groups seems indeed interesting, although probably tedious.

On some power-point presentation from Terence Parr, I have seen a slide 
mentioning huge generated DFAs. Is there some additional material 
available that documents some of the possible situations that make the 
DFAs explode?

Andreas

Loring Craymer schrieb:
> 800 token types is a staggeringly large number and indicates that you took the wrong path in dealing with the keyword versus identifier problem.  In the cases where I have had this many "keywords", they can usually be decomposed into subcategories and that information kept in a symbol table for use with sempreds.  With this many token types, you want to differentiate locally (use sempreds to recognize keywords where they should appear), not globally (all keywords should be recognized as "IDENTIFIER" in the lexer).
>
> --Loring
>
>
>