[antlr-interest] Réf. : Re: Strange "code too large" error since *very simple* gated semantic predicates

Jim Idle jimi at temporal-wave.com
Wed Dec 16 06:59:33 PST 2009


Yes – your approach to this is incorrect I am afraid. All that will happened is you will add more and more of the predicates and end up needing 150 include grammars to keep individual methods below the object limit. It is definitely time for you to step back approach this differently. You should be doing much lees work in the parser (just general syntax), then processing specifics of semantics in your AST.
 
Jim
 
From: loic.lefevre at bnpparibas.com [mailto:loic.lefevre at bnpparibas.com] 
Sent: Wednesday, December 16, 2009 4:29 AM
To: jimi at temporal-wave.com
Cc: antlr-interest at antlr.org; antlr-interest-bounces at antlr.org
Subject: Réf. : Re: [antlr-interest] Strange "code too large" error since *very simple* gated semantic predicates
 

Hello Jimi, 
First thanks for your reply. 

As you said, yes I'm really trying to enforce parsing paths. 

What I've tried so far: 

- Use int comparison instead or String comparison => KO 
- Replaced {...}?=> (gated semantic predicates) by {...}? (disambiguating semantic predicates) => OK the switch has now 152 labels 

I'll maybe use a method call next time bu I think I'm just delaying the problem here. 

Regards, 
Loïc 




Internet   
jimi at temporal-wave.com 
Envoyé par : antlr-interest-bounces at antlr.org 
15/12/2009 19:08 

Pour
antlr-interest at antlr.org 

cc
	

Objet
Re: [antlr-interest] Strange "code too large" error since *very        simple* gated semantic predicates
 
		



The predicates are likely being hoisted into other rules because of the construction of your grammar. Without seeing the whole grammar it is not really possible to advise you any further. 
  
However, I can infer from your snippet here that you are trying to enforce parsing paths. Wherever possible you should let the parser gather just about anything that COULD be valid syntax, produce an AST, then verify the AST. As you have things, your tags rules will issue syntax errors such as ‘xxx’ unexpected token. However, if you merge all the tags into the one rule, you can then walk the tree, check the message type, then see if the tags that were picked up are valid for that message type. You errors will then be of the form “The tag ‘xxx’ is not valid for message type 103”. 
  
So basically, I think that perhaps you are going about the problem in the wrong way and hence you are seeing issues like this. 
  
That said, ANTLR probably isn’t generating the most efficient code that it could, but for the moment that is what it does I am afraid. The real issue though is the way you have put your grammar together I think. With 290+ message types, taking the approach you have now really isn’t practical I think. With more knowledge of your project, perhaps I might modify my opinion of course. 
  
Jim 
  
  
  
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of loic.lefevre at bnpparibas.com
Sent: Tuesday, December 15, 2009 9:56 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Strange "code too large" error since *very simple* gated semantic predicates 
  

Hello, 
I'm encountering a strange antlr issue. I get a "code too large" error from the java compiler 
on the DFA method specialStateTransition for the following grammar rule: 

block_4_tags 
       : {"103".equals(messageType)}?=> block_4_mt103_tags 
       | {"202".equals(messageType)}?=> block_4_mt202_tags 
       ; 

The generated method has a switch with 339 labels. 

Example of generated code: 

       public int specialStateTransition(int s, IntStream _input) throws NoViableAltException { 
           TokenStream input = (TokenStream)_input; 
               int _s = s; 
           switch ( s ) { 
                   case 0 : 
                       int LA4_238 = input.LA(1); 

                         
                       int index4_238 = input.index(); 
                       input.rewind(); 
                       s = -1; 
                       if ( (LA4_238==CAPITAL_LETTER) && (("202".equals(messageType)))) {s = 278;} 

                       else if ( (LA4_238==DIGIT) && (("202".equals(messageType)))) {s = 279;} 

                         
                       input.seek(index4_238); 
                       if ( s>=0 ) return s; 
                       break; 
                   case 1 : 
                       int LA4_321 = input.LA(1); 

                         
                       int index4_321 = input.index(); 
                       input.rewind(); 
                       s = -1; 
                       if ( (LA4_321==DIGIT) && (("202".equals(messageType)))) {s = 342;} 

                       else if ( (LA4_321==LETTER) && (("202".equals(messageType)))) {s = 312;} 

                       else if ( (LA4_321==CAPITAL_LETTER) && (("202".equals(messageType)))) {s = 313;} 

                       else if ( (LA4_321==SLASH) && (("202".equals(messageType)))) {s = 314;} 

                       else if ( (LA4_321==SPACE) && (("202".equals(messageType)))) {s = 315;} 

                       else if ( (LA4_321==ANTI_SLASH) && (("202".equals(messageType)))) {s = 316;} 

                       else if ( (LA4_321==MINUS) && (("202".equals(messageType)))) {s = 317;} 

                       else if ( (LA4_321==COLON) && (("202".equals(messageType)))) {s = 318;} 

                       else if ( (LA4_321==LPAREN) && (("202".equals(messageType)))) {s = 319;} 

                       else if ( (LA4_321==RPAREN) && (("202".equals(messageType)))) {s = 320;} 

                       else if ( (LA4_321==DOT) && (("202".equals(messageType)))) {s = 321;} 

                       else if ( (LA4_321==COMMA) && (("202".equals(messageType)))) {s = 322;} 

                       else if ( (LA4_321==PLUS) && (("202".equals(messageType)))) {s = 323;} 

                       else if ( (LA4_321==QUOTE) && (("202".equals(messageType)))) {s = 324;} 

                       else if ( (LA4_321==QUESTION_MARK) && (("202".equals(messageType)))) {s = 325;} 

                         
                       input.seek(index4_321); 
                       if ( s>=0 ) return s; 
                       break; 
... 

As you can see the gated semantic predicates are propagated to almost every Java statements! 

And this is *very* strange since the calling code is: 

   public final void block_4_tags() throws RecognitionException { 
       int block_4_tags_StartIndex = input.index(); 
       try { 
           if ( state.backtracking>0 && alreadyParsedRule(input, 12) ) { return ; } 
           // SWIFTMT.g:153:9: ({...}? => block_4_mt103_tags | {...}? => block_4_mt202_tags ) 
           int alt4=2; 
           alt4 = dfa4.predict(input); 
           switch (alt4) { 
               case 1 : 
                   // SWIFTMT.g:153:11: {...}? => block_4_mt103_tags 
                   { 
                   if ( !(("103".equals(messageType))) ) { 
                       if (state.backtracking>0) {state.failed=true; return ;} 
                       throw new FailedPredicateException(input, "block_4_tags", "\"103\".equals(messageType)"); 
                   } 
                   if ( state.backtracking==0 ) { 
                      System.out.println("Tags for MT103 chosen!"); 
                   } 
                   pushFollow(FOLLOW_block_4_mt103_tags_in_block_4_tags809); 
                   block_4_mt103_tags(); 

                   state._fsp--; 
                   if (state.failed) return ; 

                   } 
                   break; 
               case 2 : 
                   // SWIFTMT.g:154:11: {...}? => block_4_mt202_tags 
                   { 
                   if ( !(("202".equals(messageType))) ) { 
                       if (state.backtracking>0) {state.failed=true; return ;} 
                       throw new FailedPredicateException(input, "block_4_tags", "\"202\".equals(messageType)"); 
                   } 
                   pushFollow(FOLLOW_block_4_mt202_tags_in_block_4_tags824); 
                   block_4_mt202_tags(); 

                   state._fsp--; 
                   if (state.failed) return ; 

                   } 
                   break; 

           } 
       } 
       catch (RecognitionException re) { 
           reportError(re); 
           recover(input,re); 
       } 
       finally { 
           if ( state.backtracking>0 ) { memoize(input, 12, block_4_tags_StartIndex); } 
       } 
       return ; 
   } 

I would rather expect something like: 

if( "103".equals(messageType) ) { 
                   pushFollow(FOLLOW_block_4_mt103_tags_in_block_4_tags809); 
                   block_4_mt103_tags(); 

                   state._fsp--; 
                   if (state.failed) return ; 
} else 
if( "202".equals(messageType) ) { 
                   pushFollow(FOLLOW_block_4_mt202_tags_in_block_4_tags824); 
                   block_4_mt202_tags(); 

                   state._fsp--; 
                   if (state.failed) return ; 
} else { /* error check? */ } 

and of course this DFA4 would never exist :o) 

Is it currently possible? 

Has anyone some workaround? 

I'll also try int comparison (I'm lucky since these are numbers) but I've got more message types to test (290+). 

Regards, 
Loïc 
  
  
  
  
This message and any attachments (the "message") is 
intended solely for the addressees and is confidential. 
If you receive this message in error, please delete it and 
immediately notify the sender. Any use not in accord with 
its purpose, any dissemination or disclosure, either whole 
or partial, is prohibited except formal approval. The internet 
can not guarantee the integrity of this message. 
BNP PARIBAS (and its subsidiaries) shall (will) not 
therefore be liable for the message if modified. 
Do not print this message unless it is necessary, 
consider the environment. 
  
                --------------------------------------------- 
  
Ce message et toutes les pieces jointes (ci-apres le 
"message") sont etablis a l'intention exclusive de ses 
destinataires et sont confidentiels. Si vous recevez ce 
message par erreur, merci de le detruire et d'en avertir 
immediatement l'expediteur. Toute utilisation de ce 
message non conforme a sa destination, toute diffusion 
ou toute publication, totale ou partielle, est interdite, sauf 
autorisation expresse. L'internet ne permettant pas 
d'assurer l'integrite de ce message, BNP PARIBAS (et ses 
filiales) decline(nt) toute responsabilite au titre de ce 
message, dans l'hypothese ou il aurait ete modifie. 
N'imprimez ce message que si necessaire, 
pensez a l'environnement.
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20091216/40e9a59c/attachment-0001.html 


More information about the antlr-interest mailing list