[antlr-interest] ANTLR3 Large Grammar Issue
shmuel siegel
antlr at shmuelhome.mine.nu
Wed Nov 28 09:48:02 PST 2007
Daniel Rippel wrote:
> I have a large grammar ~500 tokens. My target language is Java.
>
> After I run antlr to generate my Lexer.
> The Lexer fails to compile with the following error:
>
> TestLexer.java:11428: code too large
> public void mTokens() throws RecognitionException {
> ^
> 1 error
>
> And indeed the mTokens function is 10000+ lines of code.
> I also noticed and tried the -Xnoinlinedfa option. This just pushes the code too large issue over to the Parser class.
>
> So, my question is:
> Is there an antlr3 limit on the size of grammars?
>
> I also noticed that grammar inheritance is out in v3.
> So, perhaps I can back up to v2 and break the grammar into smaller chunks that way.
>
>
>
I haven't had this problem in the lexer but I have had the problem in
the parser. Typically I have found that left factoring my rules
eliminates the problem. The predictor is having to look ahead too far to
decide which token to generate. One way of dealing with this issue is to
use artificial tokens. For instance,
Instead of
AB : 'A' 'B';
AC: 'A' 'C';
you can write
A: 'A'
(
'B' {$type=AB;}
| 'C' {$type=AC;}
)
If there aren't any other tokens that start with A, mToken will be
satisfied when LA(1)='A'. It would be better if ANTLR itself realized
that methods were getting too big and it broke up the multi level
switches into calls to helper functions, but this is something for the
future.
More information about the antlr-interest
mailing list