[antlr-interest] lookahead DFA too big?
Andreas Meyer
andreas.meyer at smartshift.de
Thu Mar 5 06:35:41 PST 2009
Thomas Brandon schrieb:
> On Thu, Mar 5, 2009 at 9:50 PM, Andreas Meyer
> <andreas.meyer at smartshift.de> wrote:
>
>> Maybe it's possible to partition the set of keywords, but that would be
>> some effort: figuring out for 800 keywords, where they appear, what is
>> the context they are used in etc. Note that the problem only appeared
>> after switching to ANTLR 3.1, ANTLR 2.7 was fine with it and the
>> generated parser works well.
>>
> Presumably in ANTLR 2.7 you used the literals table, ANTLR 3 handles
> the keyword identifier distinction through it's nifty DFAs,
> unfortunately with such a large number of keywords the DFA needed gets
> pretty complicated, especially if you have a number of other rules
> which allow subsets of your keyword vocabulary.
> You can duplicate the 2.7 behaviour by having an action in your
> identifier rule test a hashtable. Something like:
> tokens {
> KEYWORDA;
> KEYWORDB;
> }
>
> @lexer::members {
> private Hashtable<String,Integer> literalsTable = new Hashtable() {{
> put("keyworda", KEYWORDA);
> put("keywordb", KEYWORDB);
> }};
>
> private int checkLiterals(String text) {
> Integer type = literalsTable.get(text);
> if(type != null)
> return type;
> else
> return ID;
> }
> }
> ID: 'a'..'z' { $type=checkLiterals($text); };
>
> Though there was a bug that caused warning when the tokens section was
> used for like that so you may need to instead have fragment rules to
> define the token types (the content is irrelevant though I don't think
> they can be empty).
>
>
Sorry for not making this clear, but this I use already ... I have a big
map that checks if an ID is actually a keyword. The problem is not with
the lexer, it is with the lookahead during parsing :-/
Andreas
More information about the antlr-interest
mailing list