[antlr-interest] [C Target][3.1.1] Trying to understand the behavior of rules with kleene stars

Sven Van Echelpoel sven.van.echelpoel at empolis.com
Tue May 12 00:41:30 PDT 2009


On Mon, 2009-05-11 at 12:03 -0700, Loring Craymer wrote:
> This is a symptom of not having an EOF at the end of your top level rule--you need to add EOF after ';'.
> 
I'm sorry, but I don't exactly understand what you mean here. I don't
think I have ever seen this mentioned before, but I may have overlooked
it for sure. Do you mean that I have to explicitly add an EOF token at
the end of the top-level rule? Like so (The rule is slightly different
from before, as I now see that it is not what I have in reality):

translation_unit
  : ( ( declaration | rule )* ';' ) EOF  // <-- Add it here?
    -> ^( UNIT rule* )
  ;

Sven

> --Loring
> 
> 
> 
> ----- Original Message ----
> > From: Sven Van Echelpoel <sven.van.echelpoel at empolis.com>
> > To: "antlr-interest at antlr.org" <antlr-interest at antlr.org>
> > Sent: Monday, May 11, 2009 7:30:29 AM
> > Subject: [antlr-interest] [C Target][3.1.1] Trying to understand the behavior of rules with kleene stars
> > 
> > Hi,
> > 
> > I'm having trouble understanding the behavior of the parser w.r.t.
> > invalid token in rules with Kleene star elements. I have this grammar
> > that says that a translation unit is zero or more rules, declarations,
> > etc. e.g.
> > 
> > translation_unit
> >   : ( declaration | rule )* ';'
> >     -> ^( UNIT rule* )    // only care about rules
> >   ;
> > 
> > Now, if a rule is followed after the semi colon by an token that is
> > illegal at that position, no more rules are processed. No error is
> > generated. Looking at the generated code, you get something like this:
> > 
> > for (;;)
> > {
> >   int alt2=2;
> >   {
> >     int LA2_0 = LA(1);
> >     if ( LA2_0 == /*some tokens expected at this position*/  )  // (1)
> >     {
> >       alt2=1;
> >     }
> > 
> > 
> >   }
> >   switch (alt2) 
> >   {
> >   case 1:
> >     /* Continue here if this was what was expected */
> >     break;
> >   default:
> >     goto loop2;    /* break out of the loop */                    //(2)
> >     break;
> >   }
> > }
> > loop2: ; /* Jump out to here if this rule does not match */    //(3)
> > 
> > In (1) the look ahead token is checked against a set of expected tokens.
> > There can be multiple else if branches following this too. If the token
> > is unexpected, the value of alt2 remains 2 and in the subsequent switch
> > the default case (2) is taken. This simply breaks out of the loop. After
> > the loop2 label processing continues as if nothing has happened (3). In
> > our example above, AST rewrite rules are invoked.
> > 
> > Note that this pattern is consistently applied every time a Kleene star
> > is used somewhere in a rule. If a token is unexpected at that position,
> > processing just stops and no error is raised. It seems to me that the
> > code is a bit too liberal in interpreting the zero of zero-or-more :-) ,
> > i.e. even zero times something expected is fine, erroneously discounting
> > the stuff that is unexpected. Am I right, or am I missing something?
> > 
> > Apologies if this is a real issue and it has already been fixed after
> > 3.1.1. I found nothing in the bug db and have currently no time to
> > investigate this is a later release.
> > 
> > Sven
> > 
> > 
> > 
> > 
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: 
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
> 
> 
>       



More information about the antlr-interest mailing list