[antlr-interest] Matching balanced parentheses in a tree grammar

Wed Sep 1 12:22:40 PDT 2010

The contributed plsql.g doesn't handle FOR i IN ( SELECT ... ) LOOP  .... 
quite right - it slurps EVERYTHING after the IN up to the LOOP by

                   ~(LOOP)+

Since I want to disambiguate between normal for-loops where you have 

                 FOR i IN 1 .. 10 LOOP ....

and those with a SQL select expression, I need to be smarter here.  It's 
not sufficient to have a rule like

         for_statement :  FOR ID IN LPAREN (~RPAREN)+ RPAREN LOOP ......

because the SELECT expression can have nested parentheses.  So we need 
something smarter.

I studied 
http://www.antlr.org/pipermail/antlr-interest/2009-October/036333.html 
which tackles this problem for lexers, and implemented
something similar.  I almost have it with the following (using a C 
runtime)  I have the  rules:

for_loop_statement scope { ANTLR3_UINT32 depth; }
       @init { $for_loop_statement::depth = plsql_paren_depth;
               plsql_paren_depth = 0;
             }
       @after{ plsql_paren_depth = $for_loop_statement::depth; }
    :
       FOR ID IN LPAREN {++plsql_paren_depth;} SELECT swallow_to_rparen 
RPAREN

          LOOP ( statement SEMI )+ END LOOP label_name?   -> ^(FOR ID 
^(SELECT swallow_to_rparen ) statement+)
    ;

swallow_to_rparen : 
       {parenMatch(ctx)}?
    ;

Where we have a validating semantic predicate (op. cit Antlr Ref, p285) 
match that checks parenMatch(ctx).  And parenMatch is implemented as a 
method with

@parser::members {
  int plsql_paren_depth;
  static ANTLR3_BOOLEAN parenMatch (pplsqlParser ctx) {
    pANTLR3_PARSER parser = ctx->pParser;
    pANTLR3_TOKEN_STREAM ts = parser->getTokenStream(parser);
    ANTLR3_UINT32 tok;

    if (plsql_paren_depth == 0) return false; // no pending RPAREN needed

      while ((tok=LA(1)) != EOF) {
        switch (tok) {
        case LPAREN: ++plsql_paren_depth; break;
        case RPAREN: --plsql_paren_depth; break;
        default: break;
        }

        if (plsql_paren_depth == 0) return true;
        ts->istream->consume(ts->istream); // munch the token
      }

      if (tok == EOF) return false;
    return false;               // unreachable
  }
}

This looks ahead to find the LPARENs or RPARENs and adjusts the depth 
appropriately.  Once the depth is zero
it can return success.

The only problem (of course) is that we are consuming the tokens not 
buffering them up for the swallow_to_rparen rule to return. So I end up 
constructing a
parse tree for the for_statement that has an empty list for the SELECT 
subtree

How can I fix this?  Or is there a better way to obtain the balanced paren 
match without having to extend my grammar to understand the guts of the 
SELECT.
(I plan to take the select tokens and send them off to an existing Bison 
parser that knows all about SQL statements)

Do I need to have an ANTLR_LIST that I concat tokens onto (inside my 
validating semantic predicate)  and somehow make that the result of the 
swallow_to_rparen rule?

ForwardSourceID:NT000798C2 

NOTICE  from Ab Initio: If received in error, please destroy and notify sender, and make no further use, disclosure, or distribution. This email (including attachments) may contain information subject to confidentiality obligations, and sender does not waive confidentiality or privilege.