[antlr-interest] How to reduce the size of the generated lexer?

Mu Qiao qiaomuf at gmail.com
Sun Mar 6 18:12:59 PST 2011


The rule is used in another rule to represent a part of a string that
doesn't contain reserved word. You're right that the grammar is not
well formed, and I should try to refactor it in some way. This rule is
only used by the rule that represents strings. So I marked it as a
fragment lexer rule and then the size of lexer seems acceptable to me.
Because Java runtime generated an acceptable lexer for the original
grammar, it confused me at first. Thanks for answering.

On Mon, Mar 7, 2011 at 1:39 AM, Jim Idle <jimi at temporal-wave.com> wrote:
> It means that that rule is not well formed, but without seeing your lexer
> I can't tell you :-) But generally, rules that say "everything but these",
> combined with overly specialized rules will cause you issues. Try to be as
> relaxed as you can in the lexer without generating ambiguities, then check
> valid characters with code; the reason is that the errors you give out
> will be semantic in nature such as "Identifiers cannot contain characters
> like 'x'" instead of: Unexpected character 'x' skipped, rather than the
> ability of ANTLR to work out some way for your rules to be encoded.
>
> Your rule below though looks highly ambiguous based on the sets. Perhaps
> all you are looking for is a final rule in the list of rules that says:
>
> ANYTHINGELSE : . { myErrorMessage(invalid); } ;
>
> Additionally adding + to such a rule will generate huge tables and will
> probably not make sense. Just removing the + will help, but is unlikely to
> be the correct solution. Post what you are trying to do rather than what
> is going wrong with what you have.
>
> Jim
>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of Mu Qiao
>> Sent: Sunday, March 06, 2011 5:19 PM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] How to reduce the size of the generated
>> lexer?
>>
>> Hi
>>
>> I use c runtime libantlr3c-3.1.3. The generated lexer is bigger than
>> 10 MB full of arrays of integers. I tried to see what was going on and
>> I found there was a rule:
>> NQCHAR_NO_ALPHANUM
>>     :   ~('\n'|'\r'|'
>> '|'\t'|'\\'|CARET|QMARK|COLON|AT|SEMIC|POUND|SLASH|BANG|TIMES|COMMA|PIP
>> E|AMP|MINUS|PLUS|PCT|EQUALS|LSQUARE|RSQUARE|RPAREN|LPAREN|RBRACE|LBRACE
>> |DOLLAR|TICK|DOT|LT|GT|SQUOTE|QUOTE|'a'..'z'|'A'..'Z'|'0'..'9')+;
>>
>> If I remove the rule, the lexer is only 400 KB.
>>
>> I'm still new to antlr and I'm not sure if there is any way to refactor
>> the rule and reduce the size of lexer. Could anyone please help me out?
>>
>> --
>> Best wishes,
>> Mu Qiao
>> GnuPG fingerprint: 92B1 B0C4 8D14 F8C4 EFA5  3ACC 30B3 0DE4 17B1 57E9
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>> email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>



-- 
Best wishes,
Mu Qiao
GnuPG fingerprint: 92B1 B0C4 8D14 F8C4 EFA5  3ACC 30B3 0DE4 17B1 57E9


More information about the antlr-interest mailing list