[antlr-interest] lookahead DFA too big?

Thu Mar 5 05:18:17 PST 2009

Thomas Brandon wrote:
> On Thu, Mar 5, 2009 at 9:50 PM, Andreas Meyer
> <andreas.meyer at smartshift.de> wrote:
>> Maybe it's possible to partition the set of keywords, but that would be
>> some effort: figuring out for 800 keywords, where they appear, what is
>> the context they are used in etc. Note that the problem only appeared
>> after switching to ANTLR 3.1, ANTLR 2.7 was fine with it and the
>> generated parser works well.
> Presumably in ANTLR 2.7 you used the literals table, ANTLR 3 handles
> the keyword identifier distinction through it's nifty DFAs,
> unfortunately with such a large number of keywords the DFA needed gets
> pretty complicated, especially if you have a number of other rules
> which allow subsets of your keyword vocabulary.
> You can duplicate the 2.7 behaviour by having an action in your
> identifier rule test a hashtable. Something like:
> tokens {
>   KEYWORDA;
>   KEYWORDB;
> }
> 
> @lexer::members {
>   private Hashtable<String,Integer> literalsTable = new Hashtable() {{
>     put("keyworda", KEYWORDA);
>     put("keywordb", KEYWORDB);
>   }};
> 
>   private int checkLiterals(String text) {
>     Integer type = literalsTable.get(text);
>     if(type != null)
>       return type;
>     else
>       return ID;
>   }
> }
> ID: 'a'..'z' { $type=checkLiterals($text); };
> 
> Though there was a bug that caused warning when the tokens section was
> used for like that so you may need to instead have fragment rules to
> define the token types (the content is irrelevant though I don't think
> they can be empty).

Fragment rules can indeed be empty; you'll get parser warnings on a 
combined grammar if you do the above (speaking from recent experience, 
although that was with 3.1.1 - I think).

Sam