[antlr-interest] Size of generated lexer code

Jim Idle jimi at temporal-wave.com
Tue Jul 1 14:50:52 PDT 2008


On Tue, 2008-07-01 at 16:05 -0400, David Goldberg wrote:
> Is it normal for a lexer with approximately 125 tokens to generate a
> C# lexer file of about 7500k? That seems rather large to me. I am
> running ANTLR 3.01


This usually happens because you either specify alts in a rule in a way
that causes huge tables, or you are trying to make the lexer scan for
all sorts of character ranges for naming compliance in variables say.
When the initial set is the entire unicode range, then of course you end
up with huge tables and rulesets.

It is generally a lot easier to accept any old characters for something
liek a variable, then validate them. This way you will also generate a
semantic message such as "Variable xxxxxy cannot use the character y in
its name." instead of the lexer producing errors or a token sequence you
are not expecting. Whether you cna do this depends on what your token
set is of course. Why don't you post your lexer? 

You might also try 3.1beta and see if it makes a difference.

Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080701/3f06f447/attachment.html 


More information about the antlr-interest mailing list