[antlr-interest] How to reduce the size of the generated lexer?

Jim Idle jimi at temporal-wave.com
Sun Mar 6 17:39:40 PST 2011


It means that that rule is not well formed, but without seeing your lexer
I can't tell you :-) But generally, rules that say "everything but these",
combined with overly specialized rules will cause you issues. Try to be as
relaxed as you can in the lexer without generating ambiguities, then check
valid characters with code; the reason is that the errors you give out
will be semantic in nature such as "Identifiers cannot contain characters
like 'x'" instead of: Unexpected character 'x' skipped, rather than the
ability of ANTLR to work out some way for your rules to be encoded.

Your rule below though looks highly ambiguous based on the sets. Perhaps
all you are looking for is a final rule in the list of rules that says:

ANYTHINGELSE : . { myErrorMessage(invalid); } ;

Additionally adding + to such a rule will generate huge tables and will
probably not make sense. Just removing the + will help, but is unlikely to
be the correct solution. Post what you are trying to do rather than what
is going wrong with what you have.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Mu Qiao
> Sent: Sunday, March 06, 2011 5:19 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] How to reduce the size of the generated
> lexer?
>
> Hi
>
> I use c runtime libantlr3c-3.1.3. The generated lexer is bigger than
> 10 MB full of arrays of integers. I tried to see what was going on and
> I found there was a rule:
> NQCHAR_NO_ALPHANUM
>     :   ~('\n'|'\r'|'
> '|'\t'|'\\'|CARET|QMARK|COLON|AT|SEMIC|POUND|SLASH|BANG|TIMES|COMMA|PIP
> E|AMP|MINUS|PLUS|PCT|EQUALS|LSQUARE|RSQUARE|RPAREN|LPAREN|RBRACE|LBRACE
> |DOLLAR|TICK|DOT|LT|GT|SQUOTE|QUOTE|'a'..'z'|'A'..'Z'|'0'..'9')+;
>
> If I remove the rule, the lexer is only 400 KB.
>
> I'm still new to antlr and I'm not sure if there is any way to refactor
> the rule and reduce the size of lexer. Could anyone please help me out?
>
> --
> Best wishes,
> Mu Qiao
> GnuPG fingerprint: 92B1 B0C4 8D14 F8C4 EFA5  3ACC 30B3 0DE4 17B1 57E9
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list