[antlr-interest] Compression of dfa tables

Jim Idle jimi at temporal-wave.com
Thu Aug 5 09:21:42 PDT 2010


The size of the DFA tables is known about, but despite cache misses, you
won't notice the effect unless you have huge inputs, in which case there
will be plenty of other cache misses anyway. However, when you end up with a
large DFA set, it usually (not always) means that the lexer can be specified
'better' (in terms of allowing ANTLR to deal with it better). 

Can you post your grammar so we can see it? Look for things like: 

C : A?A;

Which could be 

C : A A? ;

And specifying too much restriction in the character sets for tokens because
it says that in the spec (you  want to accept the maximum range that is not
ambiguous, then issue a semantic error, in keeping with pushing errors as
high up the tool chain as you can because you tend to have more context and
better errors that way.

To produce the range structure would require changes in the way the DFA is
given to the code generating templates, which I have no control of. However,
in v4, there is an improved lexer arrangement, which will avoid these huge
tables, and the code generator is more sophisticated in terms of what an
individual target can do before it generates code via the templates (right
now, there is not much of anything that a target can do other than use the
templates).

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Andreas Jonsson
> Sent: Thursday, August 05, 2010 1:54 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Compression of dfa tables
> 
> Hi,
> 
> I have just started to write a parser using antlr3 with C as target.
> The lexer is growing quickly in size due to the dfa tables.  With 61
tables the
> size of the lexer is over 4MB.  My feeling is that the consequence will be
that
> performance will suffer due to a large number of cache misses.
> 
> But the tables contains 65536 entries, where almost all elements have the
> same value.
> 
> Someone might have thought of this idea before and maybe it have been
> rejected for good reasons, but why not use a sorted set of ranges instead
of
> a flat table?  For instance:
> 
> struct DFA_RANGE {
>      ANTLR3_UINT16 start;
>      ANTLR3_UINT16 end;
>      ANTLR3_UINT32 state;
> };
> 
> struct DFA_TABLE {
>      ANTLR3_UINT32 length;
>      struct DFA_RANGE ranges[];
> };
> 
> And you would get a very compact data structure:
> 
> static const struct DFA_TABLE dfa6_T2 { 5, {{0, 34, 20}, {35, 35, 28},
{36, 60, 20},
> {61, 61, -1}, {62, 65535, 20} }};
> 
> Maybe there are some cases where such a data structure will be larger than
> an ordinary table?  In that case you can, of course, add the possibility
to fall
> back to an ordinary table.
> 
> Best regards,
> 
> Andreas Jonsson
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address



More information about the antlr-interest mailing list