[antlr-interest] Compression of dfa tables

Andreas Jonsson aj at member.fsf.org
Thu Aug 5 01:54:29 PDT 2010


Hi,

I have just started to write a parser using antlr3 with C as target.
The lexer is growing quickly in size due to the dfa tables.  With 61
tables the size of the lexer is over 4MB.  My feeling is that the
consequence will be that performance will suffer due to a large
number of cache misses.

But the tables contains 65536 entries, where almost all elements have
the same value.

Someone might have thought of this idea before and
maybe it have been rejected for good reasons, but why not use
a sorted set of ranges instead of a flat table?  For instance:

struct DFA_RANGE {
     ANTLR3_UINT16 start;
     ANTLR3_UINT16 end;
     ANTLR3_UINT32 state;
};

struct DFA_TABLE {
     ANTLR3_UINT32 length;
     struct DFA_RANGE ranges[];
};

And you would get a very compact data structure:

static const struct DFA_TABLE dfa6_T2 { 5, {{0, 34, 20}, {35, 35, 28}, 
{36, 60, 20}, {61, 61, -1}, {62, 65535, 20} }};

Maybe there are some cases where such a data structure will be larger
than an ordinary table?  In that case you can, of course, add the 
possibility to fall
back to an ordinary table.

Best regards,

Andreas Jonsson



More information about the antlr-interest mailing list