[antlr-interest] ANTLR Parser file different on different machines - method to exceed 65535 characters

Tue Aug 23 07:59:30 PDT 2011

Thanks for you response. The good news is that based on your comments, I have now managed to reproduce the problem at least. I ran the generation on a virtual machine with reduced memory and resources and got the same problem.

I will try to use the -Xwatchconversion option to try to identify places where I need to improve the grammar. Am I correct in assuming that this might be related to using predicates with high-level rules? i.e. along the lines of "(expression)=>"

Another possibility is that the STL language I am parsing has case-insensitive keywords therefore instead of setting up tokens the normal way, I have had to set up tokens for keywords in the following format:

PROCEDURE    :    ('P'|'p')('R'|'r')('O'|'o')('C'|'c')('E'|'e')('D'|'d')('U'|'u')('R'|'r')('E'|'e');

Could this be adding a lot of extra complexity? If so I would love to find a better way of doing this.

The reason I am using ANTLR 3.2 is that this was the latest version available when development on the project began and now this version has become part of ESA's platform definition for Linux servers. If I can get this changed to ANTLR 3.4, will this work better with reduced memory or will the same thing occur?

Luke

________________________________
From: Justin Murray <jmurray at aerotech.com>
To: antlr-interest at antlr.org
Sent: Tuesday, 23 August 2011, 15:57
Subject: Re: [antlr-interest] ANTLR Parser file different on different machines - method to exceed 65535 characters

Hi Luke,

I may not be the best person to answer, but I do have some suggestions. 
My guess as to why you are seeing different behavior on different 
machines is that the machines probably have different amounts of memory 
available for ANTLR to use. You are requesting 1GB of stack and heap 
each for the JVM, but if you are running on a machine with limited ram 
available, there will be issues.

The reason you are using so much ram and time to generate the parser is 
probably because of some ambiguities in your grammar. Is there a reason 
why you are using ANTLR 3.2 instead of 3.3 or 3.4? There were changes 
made in 3.3 to eliminate the need for ever using -Xconversiontimeout. 
See the Important things to note section on this page: 
http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3.3+Release+Notes. You 
should try using -Xwatchconversion to figure out where exactly your 
grammar is getting stuck, and try to find and fix the ambiguity.

Hope that helps,

- Justin

On 8/23/2011 6:27 AM, Luke Tucker wrote:
> Hi, first of all, a little bit of background that you might find interesting in terms of what ANTLR is being used for...
>
> I have been using ANTLR to convert a language called STL into JavaScript. This STL language is used to define control procedures for commanding European Space Agency ground station monitoring and control equipment. This is a custom language only used by ESA and is quite quirky in terms of grammar. The existing system is around 15 years old and is being replaced with a new system. These STL procedures are being converted into equivalent in JavaScript which will run in a Rhino engine. My grammar for doing this conversion is more or less complete and the new JavaScripts are successfully commanding ground station equipment running in a simulator. I should also mention that the target language for the grammar is Java and that I am using antlr-3.2.
>
>
> Now for the problem and the reason for my message...
>
> As part of the build process for the application doing the conversion, ANTLR is run to generate the lexer and parser files from ant before building the Java application. Previously I had been generating the Parser and Lexer in our environment and committing the resulting java files to the repository rather than performing the generation as part of the build. This works perfectly well here on our development machines on a number of different machines and even a virtual machine with a small amount of memory.
>
> Typically however, when we deliver the software to ESA and they try to run the build process on their machines, the Parser file produced is different and the build fails. The reason for the build failure is that "static final String DFA53_specialS" and "static final String DFA53_transitionS" arrays are being produced with a huge number elements, together with a massive switch statement in a "specialStateTransition()" method that is causing the method to exceed 65535 characters.
> I have read that this can occur with complex and/or unoptimised grammar and I will be the first to admit that the grammar I have written might not be 100% optimised. Since the parser generation works on our machines and not on ESAs, my limited ANTLR-foo is not the root cause of this problem. I have also confirmed that the version of ANTLR being used is exactly the same.
>
> I have done a lot of searching through the mailing list archives and found a suggestion that using -Xconversiontimeout 100000 as an input to ANTLR might help solve this issue, but that doesn't seem to be helping. Just in case I did this wrong, this is how I used this option to generate the file:
>
> java -Xms1024M -Xmx1024M -jar antlr-3.2.jar -Xconversiontimeout 100000
>
> Is there anything else that might cause a large switch block in specialStateTransition() on one machine and not another with the same grammar/same ANTLR version? Should I get them to try with even higher values for Xconversiontimeout or am I barking up the wrong tree with this?
>
> If necessary I can post the grammar I am using (minus the inline code) and an example of the STL language being parsed, but I don't think that's necessary at this stage. In any case, it will probably hurt your eyes.
>
> Thanks very much in advance for any help.
>
> Luke
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address