[antlr-interest] collecting tokens without invoking parser rules...

Alan Lehotsky ALehotsky at ABINITIO.COM
Mon Jan 17 13:40:02 PST 2011


Using Antlr 3.2 with language=C as a target

For parsing Teradata's stored-procedure language (SPL), we have the issue 
of context-sensitive token hiding.

I'm trying to use rules for SQL statements embedded in SPL that just 
swallow the tokens, so we have rules like:


        swallow_to_semi :   ~ (  SEMI  ) * ;

                update_stmt :  UPDATE swallow_to_semi;

We take the stream of tokens from this UPDATE rule and pass them off to an 
existing SQL parser.

But, because SPL has an assignment statement rule that looks like

                assignment_stmt :  SET  dotted_name '='  expression SEMI;

and teradata SQL uses 'SET' within its own grammar, when I encounter a 
source statement like


               update mytable  set x = y, a = b where a = 'none' ;

I get an error that makes it clear to me that the Antlr parser is 'seeing' 
the 'set' and trying to invoke the assignment_stmt rule.
because the complaint is about expecting a "SEMI" at the source position 
where the comma is.

I don't think that redirecting EVERYTHING in the lexer after the UPDATE to 
an alternate channel will work in all cases, because there are other 
context sensitivities in play - for example:

SELECT has to read everything to a SEMI when it appears in a statement 
context, but when there is a select clause in a FOR statement, it must 
read upto a USING, FOR, DO or SEMI token.

So, what I tried so far was code that looks like 


  static ANTLR3_BOOLEAN semicolonMatch ( pplsqlParser ctx, pANTLR3_VECTOR 
& tokens)
  {
    pANTLR3_PARSER parser = ctx->pParser;
    pANTLR3_TOKEN_STREAM ts = parser->getTokenStream(parser);
    ANTLR3_INT32 tok;
    if( ! tokens)      // If we didn't have a token list, start one now
      tokens = ctx->vectors->newVector( ctx->vectors);

    if (LA(0) == SEMI) return false; // e.g. "COMMIT ;"

    while( ( tok=LA( 1) ) != EOF)
    {
      switch( tok)
      {
        case SEMI:       return true; 
        case EOF:        return false;
        default:
          tokens->add( tokens, LT( 1), NULL);
          ts->istream->consume( ts->istream);
          continue;
      }
    }
    return false;
  }


And a modified swallow_to_semi rule that looks like

     swallow_to_semi :  tokenlist+=( {semicolonMatch(ctx, $tokenlist) }? ) 
-> $tokenlist+

but that doesn't work correctly because it seems to preemptively swallow 
the SEMI and a statement like

        COMMIT;

fails.

This feels like something that should be relatively easy to do, but I 
don't seem to be able to figure out exactly how to make it happen and I 
haven't hit upon the right search terms to find an appropriate example in 
the Antlr-interest archives or the Wiki.



  
NOTICE  from Ab Initio: If received in error, please destroy and notify sender, and make no further use, disclosure, or distribution. This email (including attachments) may contain information subject to confidentiality obligations, and sender does not waive confidentiality or privilege.   


More information about the antlr-interest mailing list