[antlr-interest] lookahead DFA too big?
Andreas Meyer
andreas.meyer at smartshift.de
Thu Mar 5 02:50:38 PST 2009
Maybe it's possible to partition the set of keywords, but that would be
some effort: figuring out for 800 keywords, where they appear, what is
the context they are used in etc. Note that the problem only appeared
after switching to ANTLR 3.1, ANTLR 2.7 was fine with it and the
generated parser works well.
Like advertised in the ANTLR book, I also used semantic predicates to
locally check if an identifier is actually the keyword I want. This did
not work out: the language is very verbose and most sentences look like
native english, so the parser then sees : ID ID ID ID and it has to
check 800 semantic predicates in order to find out which keyword comes
next. Having only one token (ID) might have been to extreme, and having
different groups seems indeed interesting, although probably tedious.
On some power-point presentation from Terence Parr, I have seen a slide
mentioning huge generated DFAs. Is there some additional material
available that documents some of the possible situations that make the
DFAs explode?
Andreas
Loring Craymer schrieb:
> 800 token types is a staggeringly large number and indicates that you took the wrong path in dealing with the keyword versus identifier problem. In the cases where I have had this many "keywords", they can usually be decomposed into subcategories and that information kept in a symbol table for use with sempreds. With this many token types, you want to differentiate locally (use sempreds to recognize keywords where they should appear), not globally (all keywords should be recognized as "IDENTIFIER" in the lexer).
>
> --Loring
>
>
>
More information about the antlr-interest
mailing list