[antlr-interest] lookahead DFA too big?
Sam Barnett-Cormack
s.barnett-cormack at lancaster.ac.uk
Thu Mar 5 05:18:17 PST 2009
Thomas Brandon wrote:
> On Thu, Mar 5, 2009 at 9:50 PM, Andreas Meyer
> <andreas.meyer at smartshift.de> wrote:
>> Maybe it's possible to partition the set of keywords, but that would be
>> some effort: figuring out for 800 keywords, where they appear, what is
>> the context they are used in etc. Note that the problem only appeared
>> after switching to ANTLR 3.1, ANTLR 2.7 was fine with it and the
>> generated parser works well.
> Presumably in ANTLR 2.7 you used the literals table, ANTLR 3 handles
> the keyword identifier distinction through it's nifty DFAs,
> unfortunately with such a large number of keywords the DFA needed gets
> pretty complicated, especially if you have a number of other rules
> which allow subsets of your keyword vocabulary.
> You can duplicate the 2.7 behaviour by having an action in your
> identifier rule test a hashtable. Something like:
> tokens {
> KEYWORDA;
> KEYWORDB;
> }
>
> @lexer::members {
> private Hashtable<String,Integer> literalsTable = new Hashtable() {{
> put("keyworda", KEYWORDA);
> put("keywordb", KEYWORDB);
> }};
>
> private int checkLiterals(String text) {
> Integer type = literalsTable.get(text);
> if(type != null)
> return type;
> else
> return ID;
> }
> }
> ID: 'a'..'z' { $type=checkLiterals($text); };
>
> Though there was a bug that caused warning when the tokens section was
> used for like that so you may need to instead have fragment rules to
> define the token types (the content is irrelevant though I don't think
> they can be empty).
Fragment rules can indeed be empty; you'll get parser warnings on a
combined grammar if you do the above (speaking from recent experience,
although that was with 3.1.1 - I think).
Sam
More information about the antlr-interest
mailing list