[antlr-interest] how to solve 'code too large' problem?

Fri Jul 13 09:40:42 PDT 2007

On 7/14/07, Chaudhari, Pranita, OPEE14 <Pranita.Chaudhari at eads.com> wrote:
>
>
> Hello,
>
> I am using Antlr V3 for writing grammar to parse UML Model (exported to
> XMI).For that I created individual token for each UML element, now the
> number of tokens are more than 150 and it creates large lexer java file. And
>   java compiler throws  following error:
>
>                    xmiLexerLexer.java:7488: code too large
>                         public void mTokens( )  throws RecognitionException{
>
> Is it possible to split the Lexer grammar file in to two  and import tokens
> from both files into Parser grammar file?
You couldn't import both vocabularies into one parser (the token
numbers would clash). And I don't think you can import a vocabulary in
a lexer so I think you'd have issues there. And there is the issue of
making the two lexers operate off one input stream, so I don't think
that's a very good alternative.
Your best bet is probably to analyse the generated mTokens method and
see where the complexity is coming in and try and remove some of it.
It sounds like there must be some pretty complicated predictors being
used.
Assuming that a large number of element names with common prefixes is
responsible for the complexity, one optimisation that might help is to
use custom code to match some names in a generic rule rather than have
seperate rules (or literals in tokens) for each. For instance
something like:
ELEMENT_NAME:
    ('a'..'z')+ {$type = getElementNameID(@text);}
    ;
Where getElementNameID is a function that maps element names to
imaginary token IDs.

Or, as others have suggested, you may be best to look at using either
a generic XML parser, or ANTXR, unless there is a compelling reason
not to, for instance parsing complicated content inside your XML
elements.
>
> I also want to apply 'UML model design rules' on the parsed data and check
> whether the model is according to design rules or not. How can I write Antlr
> grammar for this in separate grammar file  and apply it to parsed XMI data.
>
> Design rules can be like:
> -  Class names should start with a capital letter.
Sounds like you want to check out tree parsers.

Tom.
>
>
>
> -Thanks
> Pranita