[antlr-interest] Huge tables in C lexers

Sat Nov 10 04:23:14 PST 2012

Hey Jim,

> Please see ANTLR.markmail.org. The issue is with your grammar. The
> Ctarget lays out static tables that the compiler can then use directly
> in memory structures loaded from the executable target. So you see the
> full data set. Java creates compressed strings which must first be
> created at start up time and then decompressed to generate the same
> tables as C.
> 
> Review your grammar by looking at which of the tables are so big and
> correlating these with your rules. You should be able to see the
> issue.
> 
> Jim
> (At the 200th time of answering this one ;)

.. which shows how important this issue is. My lexer is 32MB in size, just because of these tables. This stems from the fact that I have to allow almost the entire Unicode BMP for my identifiers. Without that the lexer shrinks to 7MB. Maybe it would be worth implementing a similar compression feature in the C target too? Do you know if this same problem will also be existent in ANTLR v4?

Mike
-- 
www.soft-gems.net