[antlr-interest] lookahead DFA too big?

Thu Mar 5 06:35:41 PST 2009

Thomas Brandon schrieb:
> On Thu, Mar 5, 2009 at 9:50 PM, Andreas Meyer
> <andreas.meyer at smartshift.de> wrote:
>   
>> Maybe it's possible to partition the set of keywords, but that would be
>> some effort: figuring out for 800 keywords, where they appear, what is
>> the context they are used in etc. Note that the problem only appeared
>> after switching to ANTLR 3.1, ANTLR 2.7 was fine with it and the
>> generated parser works well.
>>     
> Presumably in ANTLR 2.7 you used the literals table, ANTLR 3 handles
> the keyword identifier distinction through it's nifty DFAs,
> unfortunately with such a large number of keywords the DFA needed gets
> pretty complicated, especially if you have a number of other rules
> which allow subsets of your keyword vocabulary.
> You can duplicate the 2.7 behaviour by having an action in your
> identifier rule test a hashtable. Something like:
> tokens {
>   KEYWORDA;
>   KEYWORDB;
> }
>
> @lexer::members {
>   private Hashtable<String,Integer> literalsTable = new Hashtable() {{
>     put("keyworda", KEYWORDA);
>     put("keywordb", KEYWORDB);
>   }};
>
>   private int checkLiterals(String text) {
>     Integer type = literalsTable.get(text);
>     if(type != null)
>       return type;
>     else
>       return ID;
>   }
> }
> ID: 'a'..'z' { $type=checkLiterals($text); };
>
> Though there was a bug that caused warning when the tokens section was
> used for like that so you may need to instead have fragment rules to
> define the token types (the content is irrelevant though I don't think
> they can be empty).
>
>   

Sorry for not making this clear, but this I use already ... I have a big 
map that checks if an ID is actually a keyword. The problem is not with 
the lexer, it is with the lookahead during parsing :-/

Andreas