[antlr-interest] ANTLR code too large

Jim Idle jimi at temporal-wave.com
Wed Jan 30 08:13:43 PST 2008


Jan,

My experience has been that when a very large lexer is generated it is 
because of some slight 'abuse' (so to speak) of lexer specifications and 
there is usually a better way to express the same thing and have a 
marked reduction in size etc. While ANTLR is pretty smart, it isnt a 
magic bullet ;-)

Without all of your lexer rules, it is difficult to say which would 
cause you problems, but look for these kind of things:

1) Rules that should be fragment rules (I am assuming this is ANTLR3 of 
course) as they are only 'called' by other lexer rules and should not be 
set up to return tokens to the parser;
2) Optional lead-ins in lexer rules (though they may be perfectly 
correct, you can often find a better way to put them together that will 
reduce complexity a lot. So: 'XX' ('XX' ':') 'XX' can be 'XX' 'XX' (':' 
'XX')?

Lexer rules (ANTLR3) don't allow return values, but I think that they 
parser correctly at the moment. Your best bet is (probably) to define a 
set of fragment rules then change the $type to them:

fragment MONDAY : 'Monday' ;
fragment TUESDAY : 'Tuesday' ;
...

DAY
	: MONDAY	{ $type = MONDAY; }
	| TUESDAY	{ $type = TUESDAY; }
....

day
	: MONDAY | TUESDAY | ...

If your parser grammar is too big, then it may just be that it is a huge 
grammar. Some solutions for this type of thing are coming in ANTLR 3.1 I 
believe, which is delayed as much as anything because I am halfway 
through catching up the runtime to the Java version ;-)

Jim

> -----Original Message-----
> From: Jan Nielsen [mailto:jan.sture.nielsen at gmail.com]
> Sent: Tuesday, January 29, 2008 5:57 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] ANTLR code too large
> 
> My ANTLR grammar sippit below generates code that is "too large" from
> a compiler/class-file perspective; if I use the lexer rule DAY_OF_WEEK
> everything is fine but it doesn't parse correctly because of the lack
> of return values. I can use "noinlinedfa" [1] to address the lexer
> rule size* problem, but I'm wondering if there's a better way. In my
> grammar [2], I define a dayOfWeek parser rule (I suppose could pass
> the result back through the parser in a lexer rule action?). Is there
> a better way to do this sort of thing?
> 
> Thanks,
> 
> -Jan
> 
> 
> (*) Does anyone know how to tell the Maven 2 antlr-maven-plugin [3] to
> use -Xnoinlinedfa during the code-generation phase?
> [1] http://www.antlr.org/pipermail/antlr-interest/2007-
> December/025360.html
> [2] http://www.antlr.org/pipermail/antlr-interest/2008-
> January/026019.html
> [3] http://mojo.codehaus.org/antlr-maven-plugin/
> 
> dayOfWeek returns [adc.util.Day value]
> /*
>     : DAY_OF_WEEK                { $value = adc.util.Day.MONDAY; }
>     ;
> */
> 
>     : 'Monday'                   { $value = adc.util.Day.MONDAY; }
>     | 'Tuesday'                  { $value = adc.util.Day.TUESDAY; }
>     | 'Wednesday'                { $value = adc.util.Day.WEDNESDAY; }
>     | 'Thursday'                 { $value = adc.util.Day.THURSDAY; }
>     | 'Friday'                   { $value = adc.util.Day.FRIDAY; }
>     | 'Saturday'                 { $value = adc.util.Day.SATURDAY; }
>     | 'Sunday'                   { $value = adc.util.Day.SUNDAY; }
>     ;
> 
> DAY_OF_WEEK returns [adc.util.Day value]
>     : 'Monday'                   { $value = adc.util.Day.MONDAY; }
>     | 'Tuesday'                  { $value = adc.util.Day.TUESDAY; }
>     | 'Wednesday'                { $value = adc.util.Day.WEDNESDAY; }
>     | 'Thursday'                 { $value = adc.util.Day.THURSDAY; }
>     | 'Friday'                   { $value = adc.util.Day.FRIDAY; }
>     | 'Saturday'                 { $value = adc.util.Day.SATURDAY; }
>     | 'Sunday'                   { $value = adc.util.Day.SUNDAY; }
>     ;




More information about the antlr-interest mailing list