[antlr-interest] Syntactic predicates cause unexplainable compilation errors in different partsof the code

Loring Craymer Loring.G.Craymer at jpl.nasa.gov
Wed Jan 26 14:54:25 PST 2005


At 02:11 PM 1/26/2005, Peter Robinson wrote:
>On Wed, 2005-01-26 at 22:48, Loring Craymer wrote:
> > This is one of those cases that is usually handled by factoring out the 
> comma:
> >
> > gene_ref:
> >        gene_refline ("," gene_refline)*
> >        ;
> >
>
>Thanks, I did consider that but was stumped because of the following.
>When I parse a GeneRef object, I create a Java object to store the info
>that in turn gets passed further up. I need to initialize up to 6
>different variables in this object depending on which type of
>gene_refline are found (mainly strings, but also more complicated
>structures). This is why I wanted to try to do everything in one rule so
>that I would always have the reference to the GeneRef object handy...

I'd handle this case by having gene_refline be passed a GeneRef argument 
unless there is a reason for insisting on order of keywords (either order 
is needed for evaluation, or you are producing a syntax checker).


>gene_ref returns [GeneRef gr=null]
>{
>         String s;
>} GENEREF_KW { gr = new GeneRef(); }
>         s=gene_refline { gr.addString(s); }  // But there are 6
>                                         //different types of 
> String                                     //variable to be initialied
>                         //depending on what type of gene_refline
>
>
>
> > However, I think what you are really running into is ANTLR 2's approximate
> > LLk.  If you look at the generated code (without the synpreds), I think
> > that you will find that it does the right thing.
> >
>
>It was actually (nearly) correct, but since a bunch of downstream

Would it do the right thing, though?  Usually, the if statements have 
cross-product conditionals (if ((LA(1) == A || LA(1) ==B) && (LA(2) == 
STRING) || LA(2) == BOOLEAN)))
which are not strictly correct, but the case statements impose the correct 
orderings when matching tokens.

>analyses depend on correct parsing, I would somehow like to get rid of
>all error messages...

A good goal--every time I run into this consequence of the LLk-approx, I 
get paranoid.  There is an inline option to suppress the nondeterminism 
warnings when you can verify that the generated code would work properly, 
but I tend to avoid that myself.

--Loring



>Monty, thanks for your reply. Yes I did look at the ASN.1 grammar on the
>website, but (given that I am not that well versed in ASN.1, at least
>not yet) I was not able to adapt that to my needs...
>
>
>
>
> > --Loring
> >
> >
> > At 12:34 PM 1/26/2005, Peter Robinson wrote:
> >
> >
> > >Gene-ref ::= SEQUENCE {
> > >      A VisibleString OPTIONAL ,
> > >      B VisibleString OPTIONAL ,
> > >     C VisibleString OPTIONAL ,
> > >     D VisibleString OPTIONAL ,
> > >     E BOOLEAN DEFAULT FALSE ,
> > >     F SET OF Dbtag OPTIONAL ,
> > >     G SET OF VisibleString OPTIONAL ,
> > >     H  VisibleString OPTIONAL }
> > >   END
> > >
> > >
> > >
> > >
> > >Dear ANTLR list,
> > >
> > >First of all thanks to you all for being a helpful and informative list.
> > >I recently have been trying to learn antlr and cannot now imagine using
> > >things like lex/yacc with which I previously occasionally did things.
> > >
> > >I am now trying to parse a file structure from NCBI in ASN.1 format. The
> > >specification of a small part of the entire thing is as follows  ( I
> > >have replaced some keywords with the letters A-H for clarity). Any one
> > >of the entries is optional and is followed by a comma if there is going
> > >to be another line. There are Gene-ref entries with only one entry (and
> > >no comma).
> > >
> > >
> > >Gene-ref ::= SEQUENCE {
> > >      A VisibleString OPTIONAL ,
> > >      B VisibleString OPTIONAL ,
> > >     C VisibleString OPTIONAL ,
> > >     D VisibleString OPTIONAL ,
> > >     E BOOLEAN DEFAULT FALSE ,
> > >     F SET OF Dbtag OPTIONAL ,
> > >     G SET OF VisibleString OPTIONAL ,
> > >     H  VisibleString OPTIONAL }
> > >   END
> > >
> > >After trying constructs such as (",")? and getting nondeterminateness
> > >warnings, I tried my hand at a syntactic predicate as follows:
> > >
> > >generef_line returns [myJavaObject ... ]
> > >{
> > >         String s;
> > >         Dbtag d;
> > >}: GENE_KW "{"
> > >        (        ( A STRING ",")=>
> > >            A  s1:STRING { System.out.println(s1.getText()); }  ","
> > >         |  A  s2:STRING { System.out.println(s2.getText()); }
> > >         )?
> > >         (  (B STRING ",")=>
> > >            B s3:STRING { System.out.println(s3.getText()); } ","
> > >         |  B s4:STRING {  System.out.println(s4.getText()); }
> > >         )?
> > >         AND SO ON...
> > >
> > >         "}"
> > >;
> > >
> > >
> > >However, this now causes unexplainable compilation errors in other parts
> > >of the code (about 400 lines of grammar etc) to appear, in code that
> > >**worked perfectly fine** before. What is going on?? and is there a
> > >better way to parse the above construct? Thanks, Peter
> > >
> > >--
> > >Peter N. Robinson
> > >peter.robinson at t-online.de
> > >peter.robinson at charite.de
> > >http://www.charite.de/ch/medgen/robinson/
>--
>Peter N. Robinson
>peter.robinson at t-online.de
>peter.robinson at charite.de
>http://www.charite.de/ch/medgen/robinson/




More information about the antlr-interest mailing list