[antlr-interest] Antlr 3 generates very big java classes which do not compile

Tue Dec 26 08:40:29 PST 2006

Hi!

On 26. Dec 2006, at 16:52 , gmx wrote:

>
> So the actual question is. Wether to create an additional java  
> backend for
> antlr which will not only emit on source file (*.java) for the  
> lexer and one
> for the parser, but will create several files so that these still  
> comply to
> the java specification and can be compiled?
> E.g. Instead of one java source file for the whole parse, a source  
> file is
> created for each rule in the grammar.

I doubt that this will be part of the standard distribution, but Ter  
might think otherwise (although
this will definitely not make it into the 3.0 release, due to more  
important issues needing attention).

If it is a big concern for you right now, you could manually split up  
the files, though I think
in Java each file must contain a single class, isn't that right? If  
that's true, you'd have
to split up the recognizer across multiple classes, which will pose  
several problems in itself.

In the end, there must be one mTokens method. You could of course  
split that up into two or more methods,
but for the moment you're on your own, sorry.
Creating a new target probably wouldn't cut it, because the  
CodeGenerator is responsible of putting the output
together. The splitting must happen there (and won't be pretty, I'm  
afraid...).

Oh, I just remembered the following:
It's probably really bad advice, but because it's Christmas and I  
feel like giving bad advice this time
of the year, I give it nonetheless ;)
There's a static boolean in CodeGenerator.java, which toggles whether  
to create inline DFAs or always create
DFA objects for prediction. Be careful, this is a global switch, you  
will not get *any* inline (if and switch-based)
DFAs anymore.
Changing it to false and running your grammar through the modified  
ANTLR compiles just fine for me.
Essentially, it splits the mTokens method into one call to  
DFA.predict and the big switch which calls the
predicted alt's methods. You could get away with this for now.
Try both methods and compare the output (esp. mTokens()). You'll see  
what I mean.
Compiling the generated lexer source takes about 6 seconds on my  
machine...FWIW (=nothing ;)).

HTH,
-k
-- 
Kay Röpke
http://classdump.org/