[antlr-interest] Huge tables in C lexers

Sam Harwell sam at tunnelvisionlabs.com
Sat Nov 10 15:11:24 PST 2012


The Java version keeps *both* the compressed and decompressed arrays in memory. The files generated by the C target are much larger, but the actual runtime overhead is lower since it only has to keep the uncompressed tables in memory.

ANTLR 4 uses a completely different representation for the tables. The memory overhead is especially improved for grammars with Unicode support, but speed takes a tiny hit for each Unicode character that actually appears in the input (shouldn't ever be a problem).

--
Sam Harwell
Owner, Lead Developer
http://tunnelvisionlabs.com

-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Mike Lischke
Sent: Saturday, November 10, 2012 6:23 AM
To: Jim Idle
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Huge tables in C lexers


Hey Jim,

> Please see ANTLR.markmail.org. The issue is with your grammar. The 
> Ctarget lays out static tables that the compiler can then use directly 
> in memory structures loaded from the executable target. So you see the 
> full data set. Java creates compressed strings which must first be 
> created at start up time and then decompressed to generate the same 
> tables as C.
> 
> Review your grammar by looking at which of the tables are so big and 
> correlating these with your rules. You should be able to see the 
> issue.
> 
> Jim
> (At the 200th time of answering this one ;)

.. which shows how important this issue is. My lexer is 32MB in size, just because of these tables. This stems from the fact that I have to allow almost the entire Unicode BMP for my identifiers. Without that the lexer shrinks to 7MB. Maybe it would be worth implementing a similar compression feature in the C target too? Do you know if this same problem will also be existent in ANTLR v4?

Mike
--
www.soft-gems.net



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address




More information about the antlr-interest mailing list