[antlr-interest] ANTLR3 Large Grammar Issue

Wed Nov 28 09:48:02 PST 2007

Daniel Rippel wrote:
> I have a large grammar ~500 tokens.  My target language is Java.
>
> After I run antlr to generate my Lexer.  
> The Lexer fails to compile with the following error:
>
> TestLexer.java:11428: code too large
>     public void mTokens() throws RecognitionException {
>                 ^
> 1 error
>
> And indeed the mTokens function is 10000+ lines of code.
> I also noticed and tried the -Xnoinlinedfa option.  This just pushes the code too large issue over to the Parser class.
>
> So, my question is:
> Is there an antlr3 limit on the size of grammars?
>
> I also noticed that grammar inheritance is out in v3.  
> So, perhaps I can back up to v2 and break the grammar into smaller chunks that way.
>
>
>   
I haven't had this problem in the lexer but I have had the problem in 
the parser. Typically I have found that left factoring my rules 
eliminates the problem. The predictor is having to look ahead too far to 
decide which token to generate. One way of dealing with this issue is to 
use artificial tokens. For instance,

Instead of

AB : 'A' 'B';
AC: 'A' 'C';

you can write

A: 'A'
    (
       'B' {$type=AB;}
    | 'C'   {$type=AC;}
    )

If there aren't any other tokens that start with A, mToken will be 
satisfied when LA(1)='A'. It would be better if ANTLR itself realized 
that methods were getting too big and it broke up the multi level 
switches into calls to helper functions, but this is something for the 
future.