[antlr-interest] [C Target][3.1.1] Trying to understand the behavior of rules with kleene stars
Sven Van Echelpoel
sven.van.echelpoel at empolis.com
Tue May 12 00:41:30 PDT 2009
On Mon, 2009-05-11 at 12:03 -0700, Loring Craymer wrote:
> This is a symptom of not having an EOF at the end of your top level rule--you need to add EOF after ';'.
>
I'm sorry, but I don't exactly understand what you mean here. I don't
think I have ever seen this mentioned before, but I may have overlooked
it for sure. Do you mean that I have to explicitly add an EOF token at
the end of the top-level rule? Like so (The rule is slightly different
from before, as I now see that it is not what I have in reality):
translation_unit
: ( ( declaration | rule )* ';' ) EOF // <-- Add it here?
-> ^( UNIT rule* )
;
Sven
> --Loring
>
>
>
> ----- Original Message ----
> > From: Sven Van Echelpoel <sven.van.echelpoel at empolis.com>
> > To: "antlr-interest at antlr.org" <antlr-interest at antlr.org>
> > Sent: Monday, May 11, 2009 7:30:29 AM
> > Subject: [antlr-interest] [C Target][3.1.1] Trying to understand the behavior of rules with kleene stars
> >
> > Hi,
> >
> > I'm having trouble understanding the behavior of the parser w.r.t.
> > invalid token in rules with Kleene star elements. I have this grammar
> > that says that a translation unit is zero or more rules, declarations,
> > etc. e.g.
> >
> > translation_unit
> > : ( declaration | rule )* ';'
> > -> ^( UNIT rule* ) // only care about rules
> > ;
> >
> > Now, if a rule is followed after the semi colon by an token that is
> > illegal at that position, no more rules are processed. No error is
> > generated. Looking at the generated code, you get something like this:
> >
> > for (;;)
> > {
> > int alt2=2;
> > {
> > int LA2_0 = LA(1);
> > if ( LA2_0 == /*some tokens expected at this position*/ ) // (1)
> > {
> > alt2=1;
> > }
> >
> >
> > }
> > switch (alt2)
> > {
> > case 1:
> > /* Continue here if this was what was expected */
> > break;
> > default:
> > goto loop2; /* break out of the loop */ //(2)
> > break;
> > }
> > }
> > loop2: ; /* Jump out to here if this rule does not match */ //(3)
> >
> > In (1) the look ahead token is checked against a set of expected tokens.
> > There can be multiple else if branches following this too. If the token
> > is unexpected, the value of alt2 remains 2 and in the subsequent switch
> > the default case (2) is taken. This simply breaks out of the loop. After
> > the loop2 label processing continues as if nothing has happened (3). In
> > our example above, AST rewrite rules are invoked.
> >
> > Note that this pattern is consistently applied every time a Kleene star
> > is used somewhere in a rule. If a token is unexpected at that position,
> > processing just stops and no error is raised. It seems to me that the
> > code is a bit too liberal in interpreting the zero of zero-or-more :-) ,
> > i.e. even zero times something expected is fine, erroneously discounting
> > the stuff that is unexpected. Am I right, or am I missing something?
> >
> > Apologies if this is a real issue and it has already been fixed after
> > 3.1.1. I found nothing in the bug db and have currently no time to
> > investigate this is a later release.
> >
> > Sven
> >
> >
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
>
>
More information about the antlr-interest
mailing list