[antlr-interest] ANTLR Semantic Predicate Check Exceeding 65535 Bytes Limit

Zachary Palmer zep_antlr at bahj.com
Wed Oct 27 15:46:04 PDT 2010


Hello again. :)

I seem to have hit an interesting boundary; ANTLR has generated a method 
which is more than 64K in size.  This would be relatively unremarkable - 
goodness knows that this has happened to other people - except that the 
particular reason I have encountered seems to be somewhat strange.  The 
following is a method generated in a DFA that ANTLR produced for me:

         public int specialStateTransition(int s, IntStream _input) 
throws NoViableAltException {
             TokenStream input = (TokenStream)_input;
             int _s = s;
             switch ( s ) {
                     case 0 :
                         int LA58_0 = input.LA(1);
                         int index58_0 = input.index();
                         input.rewind();
                         s = -1;
                         if ( (LA58_0==128) && 
((((configuration.getMetaprogramsSupported())&&(configuration.getCodeSplicingSupported()))||((configuration.getMetaAnnotationsSupported())&&(configuration.getCodeSplicingSupported()))||((configuration.getMetaAnnotationsSupported())&&(configuration.getCodeSplicingSupported()))|| 
/****** GREAT BIG SNIP ******/ 
((configuration.getMetaAnnotationsSupported())&&(configuration.getCodeSplicingSupported()))||((configuration.getMetaprogramsSupported())&&(configuration.getCodeSplicingSupported()))))) 
{s = 1;}
                         else if ( 
(LA58_0==ABSTRACT||LA58_0==CLASS||(LA58_0>=ENUM && 
LA58_0<=FINAL)||LA58_0==INTERFACE||LA58_0==NATIVE||(LA58_0>=PRIVATE && 
LA58_0<=PUBLIC)||(LA58_0>=STATIC && 
LA58_0<=STRICTFP)||LA58_0==SYNCHRONIZED||LA58_0==TRANSIENT||LA58_0==VOLATILE||LA58_0==SEMI||LA58_0==MONKEYS_AT) 
) {s = 2;}
                         else if ( (LA58_0==METAPROGRAM_START) && 
((configuration.getMetaprogramsSupported()))) {s = 18;}
                         input.seek(index58_0);
                         if ( s>=0 ) return s;
                         break;
                     case 1 :
                         int LA58_1 = input.LA(1);
                         int index58_1 = input.index();
                         input.rewind();
                         s = -1;
                         if ( 
((synpred64_BsjAntlr()&&(configuration.getCodeSplicingSupported()))) ) 
{s = 19;}
                         else if ( 
((((configuration.getMetaAnnotationsSupported())&&(configuration.getCodeSplicingSupported()))||(configuration.getCodeSplicingSupported())||((configuration.getMetaprogramsSupported())&&(configuration.getCodeSplicingSupported()))||((configuration.getMetaAnnotationsSupported())&&(configuration.getCodeSplicingSupported()))||((configuration.getMetaAnnotationsSupported())&&(configuration.getCodeSplicingSupported()))||((configuration.getMetaAnnotationsSupported())&&(configuration.getCodeSplicingSupported()))||((configuration.getMetaAnnotationsSupported())&&(configuration.getCodeSplicingSupported())))) 
) {s = 18;}
                         input.seek(index58_1);
                         if ( s>=0 ) return s;
                         break;
             }
             if (state.backtracking>0) {state.failed=true; return -1;}
             NoViableAltException nvae =
                 new NoViableAltException(getDescription(), 58, _s, input);
             error(nvae);
             throw nvae;
         }

Looks pretty normal... except for the astonishingly common constructs 
like 
"(((configuration.getMetaprogramsSupported())&&(configuration.getCodeSplicingSupported()))".  
In fact, each of these constructs is a semantic predicate applied to a 
rule I use fairly often: meta-annotations.  These are similar to normal 
annotations in that they can appear in the modifiers clause of any Java 
declaration (or, additionally, as a prefix to any Java statement).  I 
guarded them with this semantic predicate to allow me to turn my parser 
into a normal Java parser by fiddling with the configuration; I observed 
this technique used in the Java 1.5 parser on the ANTLRv3 site and 
thought it quite sensible.

The catch is that the comment /****** GREAT BIG SNIP ******/ above is 
hiding more than 200,000 characters of code.  This same condition is 
repeated here an enormous number of times.  I've looked throughout my 
parser and discovered that this same pattern appears in many other 
places as well.  In some cases, a generated ANTLR syntactic predicate is 
also called in this fashion, such as in:

                 if ( 
(((synpred188_BsjAntlr()&&(configuration.getCodeSplicingSupported()))||(synpred188_BsjAntlr()&&(configuration.getCodeSplicingSupported()))||(synpred188_BsjAntlr()&&(configuration.getCodeSplicingSupported())))) 
) {
                     alt150=1;
                 }

Does anyone have any idea what I've done to so infuriate the gods?  I'll 
probably be using a regular expression to seek through the code and 
eliminate the most egregious of cases -- I'm already using an ANT script 
to add @SuppressWarnings annotations to the classes, so it's not that 
far out of my build process -- but any hint as to how I did this or what 
I could to do solve it correctly would be quite appreciated.

Thanks!

- Zach


More information about the antlr-interest mailing list