[antlr-interest] Réf. : Re: Pass parameters to DFAs for semantic predicate (or AntLR 3.3 wish list? :o) )

Jim Idle jimi at temporal-wave.com
Wed Dec 16 08:31:15 PST 2009


This is why I think you might be better off with a filtering lexer only. You can then get ANTLR to a lot of the grunt work and then have some supporting methods to work specific magic. I think that the parser is just getting in your way here. When I have written SWIFT parsing before, I have used high level text manipulation languages, where it is much easier to deal with the logic. Regular expressions for it would be pretty much unmaintainable, which is why of course half the implementations I have seen use regular expressions ;-)
 
Jim
 
From: loic.lefevre at bnpparibas.com [mailto:loic.lefevre at bnpparibas.com] 
Sent: Wednesday, December 16, 2009 8:03 AM
To: jimi at temporal-wave.com
Cc: antlr-interest at antlr.org; antlr-interest-bounces at antlr.org
Subject: Réf. : Re: [antlr-interest] Pass parameters to DFAs for semantic predicate (or AntLR 3.3 wish list? :o) )
 

Yes SWIFT parsing by hand is very easy (at least I've almost finished it in 30 minutes). 

But here come my problems with ambiguities to treat. 

For example, we can have a tag named 53B; its format is [/1!a][/34x]#[35x] 

which means: 
- optionally (absolutely 1 upper case letter preceded by a /) 
- optionally (between 1 and 34 chars preceded by a /) 
- then a carriage return (CrLf); this \r\n can be optional if both previous fields are not present or if the following field does not exist 
- optionally (between 1 and 35 chars) 

Now with the input String: 

/YOH 
LCN484841 

I need to detect: 
- field 1 is not present 
- field 2's value is YOH 
- field 3's value is LCN484841 

and of course, you may have very complex regular expression to handle (this one is pretty simple) 

Regards, 
Loïc 




Internet   
jimi at temporal-wave.com 
Envoyé par : antlr-interest-bounces at antlr.org 
16/12/2009 16:53 

Pour
antlr-interest at antlr.org 

cc
	

Objet
Re: [antlr-interest] Pass parameters to DFAs for semantic predicate        (or AntLR 3.3 wish list? :o) )
 
		



Your predicate is based on a local variable so the generated methods for the DFA do not see it. You will have to set the length in to a scope variable and use that in your predicate: 
  
data_x[ int length ] 
returns[ String s ] 
@init { 
final StringBuilder sb = new StringBuilder(); 
} 
@after { 
 s = sb.toString(); 
} 
scope { 
int sLen = 0; 
} 
  
{ sb.append($d.text); $data_x::sLen = sb.length(); … 
  
However I am not sure if it is safe for you to just return from the rule yourself. It might be though as you are not building trees etc. 
  
I wonder if rather than a lexer/parser, you just need to use a filtering lexer, or whether in fact this format is really lending itself to be parsed by something like ANTLR. Perhaps you just need hand crafted code. Are these SWIFT records or something similar with fixed length/length encoded fields? Something like awk may be better for this. 
  
Jim 
  
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of loic.lefevre at bnpparibas.com
Sent: Wednesday, December 16, 2009 7:16 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Pass parameters to DFAs for semantic predicate (or AntLR 3.3 wish list? :o) ) 
  

Hello again, 
I continue to struggle with AntLR :o) 

I think I've got a real problem now. 

I have a grammar that is absolutely ambiguous that's why I absolutely need backtracking :o) 

So ambiguous that I also need variable length tokens. 

For example, when I need to parse at most 16 chars (for a given data type), I've got: 

data_x[ int length ] 
returns[ String s ] 
@init { 
final StringBuilder sb = new StringBuilder(); 
} 
@after { 
 s = sb.toString(); 
} 
: (               ( d=DIGIT { sb.append($d.text);if( sb.length() == length ) { return sb.toString(); }} | 
                     l=LETTER { sb.append($l.text);if( sb.length() == length ) { return sb.toString(); }} | 
                     cl=CAPITAL_LETTER { sb.append($cl.text);if( sb.length() == length ) { return sb.toString(); }} | 
                     SLASH { sb.append('/');if( sb.length() == length ) { return sb.toString(); }} | 
                     SPACE { sb.append(' ');if( sb.length() == length ) { return sb.toString(); }} | 
                     ANTI_SLASH { sb.append('\\');if( sb.length() == length ) { return sb.toString(); }} | 
                     MINUS { sb.append('-');if( sb.length() == length ) { return sb.toString(); }} | 
                     COLON { sb.append(':');if( sb.length() == length ) { return sb.toString(); }} | 
                     LPAREN { sb.append('(');if( sb.length() == length ) { return sb.toString(); }} | 
                     RPAREN { sb.append(')');if( sb.length() == length ) { return sb.toString(); }} | 
                     DOT { sb.append('.');if( sb.length() == length ) { return sb.toString(); }} | 
                     COMMA { sb.append(',');if( sb.length() == length ) { return sb.toString(); }} | 
                     PLUS { sb.append('+');if( sb.length() == length ) { return sb.toString(); }} | 
                     QUOTE { sb.append('\'');if( sb.length() == length ) { return sb.toString(); }} | 
                     QUESTION_MARK { sb.append('?');if( sb.length() == length ) { return sb.toString(); }} 
                   ) 
 )+ 
; 

I know this is awful but at least it works or I should precise, it worked. 

The problem here is that I can't use a disambiguating semantic predicate such as: 

data_x[ int length ] 
returns[ String s ] 
@init { 
final StringBuilder sb = new StringBuilder(); 
} 
@after { 
 s = sb.toString(); 
} 
: ( 
{sb.length() < length}? 
                  ( d=DIGIT { sb.append($d.text);if( sb.length() == length ) { return sb.toString(); }} | 
                     l=LETTER { sb.append($l.text);if( sb.length() == length ) { return sb.toString(); }} | 
... 

since the sb and length variables are not pushed inside the DFA :o( 

It could be interesting to have at least the length parameter "pushed" into the dfa via a generated setter for example: 

   class DFA149 extends DFA { 
       
       private int length; 

       public DFA149(BaseRecognizer recognizer) { 
       ... 
       } 

       public void setLength( int length ) { 
            this.length = length; 
       } 

       public String getDescription() { 
           return "()+ loopback of 1163:3: ({...}? (d= DIGIT | l= LETTER | cl= CAPITAL_LETTER | SLASH | SPACE | ANTI_SLASH | MINUS | COLON | LPAREN | RPAREN | DOT | COMMA | PLUS | QUOTE | QUESTION_MARK ) )+"; 
       } 
       public int specialStateTransition(int s, IntStream _input) throws NoViableAltException { 
           TokenStream input = (TokenStream)_input; 
               int _s = s; 
           switch ( s ) { 
                   case 0 : 
                       int LA149_14 = input.LA(1); 

                         
                       int index149_14 = input.index(); 
                       input.rewind(); 
                       s = -1; 
                       if ( ((synpred230_SWIFTMT()&&(sb.length() < length))) ) {s = 17;} 

                       else if ( ((sb.length() < length)) ) {s = 1;} 
... 

Then the length parameter could be used inside the specialStateTransition method and we could imagine such a principle used for the synpred230_SWIFTMT() methods also. 
One point I don't understand is why my predicate is not pushed before the generated syntactic predicate like: 

                       if ( (((sb.length() < length)&&synpred230_SWIFTMT())) ) {s = 17;} 

instead of 

                       if ( ((synpred230_SWIFTMT()&&(sb.length() < length))) ) {s = 17;} 

Since my comparison is faster :o) Maybe there are some reasons for that, could someone explain me? 


Finally, I've got of course another problem with the kind of action I set: 

if( sb.length() == length ) { return sb.toString(); } 

I just return from the rule if I reached the maximum length. This work well since there are the blocks catch and finally to handle properly what needs to be done (backtracking / error handling). 
However when backtracking, the action is not run, see generated code: 

                   case 1 : 
                       // C:\\GRP_Head\\GRP_Dev\\Development\\frameworks\\Foxhound\\target\\generated\\com\\bnpparibas\\acetp\\foxhound\\spec2009\\parser\\SWIFTMT.g:1108:6: cl= CAPITAL_LETTER 
                       { 
                       cl=(Token)match(input,CAPITAL_LETTER,FOLLOW_CAPITAL_LETTER_in_data_a8285); if (state.failed) return s; 
                       if ( state.backtracking==0 ) { 
                          sb.append((cl!=null?cl.getText():null)); if( sb.length() == length ) { return sb.toString(); } 
                       } 

                       } 
                       break; 

So this "trick" does not work anymore (it used to work however). 

With a grammar managing 2 message types (see previous posts) no problem. 
With a third one, I get the following error message: 

line 2:5 no viable alternative at input 'C' 


I begin to doubt that antlr v3 will be able to parse SWIFT MT messages :o( 


Regards, 
Loïc 
  
  
  
  
This message and any attachments (the "message") is 
intended solely for the addressees and is confidential. 
If you receive this message in error, please delete it and 
immediately notify the sender. Any use not in accord with 
its purpose, any dissemination or disclosure, either whole 
or partial, is prohibited except formal approval. The internet 
can not guarantee the integrity of this message. 
BNP PARIBAS (and its subsidiaries) shall (will) not 
therefore be liable for the message if modified. 
Do not print this message unless it is necessary, 
consider the environment. 
  
                --------------------------------------------- 
  
Ce message et toutes les pieces jointes (ci-apres le 
"message") sont etablis a l'intention exclusive de ses 
destinataires et sont confidentiels. Si vous recevez ce 
message par erreur, merci de le detruire et d'en avertir 
immediatement l'expediteur. Toute utilisation de ce 
message non conforme a sa destination, toute diffusion 
ou toute publication, totale ou partielle, est interdite, sauf 
autorisation expresse. L'internet ne permettant pas 
d'assurer l'integrite de ce message, BNP PARIBAS (et ses 
filiales) decline(nt) toute responsabilite au titre de ce 
message, dans l'hypothese ou il aurait ete modifie. 
N'imprimez ce message que si necessaire, 
pensez a l'environnement.
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20091216/5cc07850/attachment-0001.html 


More information about the antlr-interest mailing list