[antlr-interest] collecting tokens without invoking parser rules...
    Alan Lehotsky 
    ALehotsky at ABINITIO.COM
       
    Mon Jan 17 13:40:02 PST 2011
    
    
  
Using Antlr 3.2 with language=C as a target
For parsing Teradata's stored-procedure language (SPL), we have the issue 
of context-sensitive token hiding.
I'm trying to use rules for SQL statements embedded in SPL that just 
swallow the tokens, so we have rules like:
        swallow_to_semi :   ~ (  SEMI  ) * ;
                update_stmt :  UPDATE swallow_to_semi;
We take the stream of tokens from this UPDATE rule and pass them off to an 
existing SQL parser.
But, because SPL has an assignment statement rule that looks like
                assignment_stmt :  SET  dotted_name '='  expression SEMI;
and teradata SQL uses 'SET' within its own grammar, when I encounter a 
source statement like
               update mytable  set x = y, a = b where a = 'none' ;
I get an error that makes it clear to me that the Antlr parser is 'seeing' 
the 'set' and trying to invoke the assignment_stmt rule.
because the complaint is about expecting a "SEMI" at the source position 
where the comma is.
I don't think that redirecting EVERYTHING in the lexer after the UPDATE to 
an alternate channel will work in all cases, because there are other 
context sensitivities in play - for example:
SELECT has to read everything to a SEMI when it appears in a statement 
context, but when there is a select clause in a FOR statement, it must 
read upto a USING, FOR, DO or SEMI token.
So, what I tried so far was code that looks like 
  static ANTLR3_BOOLEAN semicolonMatch ( pplsqlParser ctx, pANTLR3_VECTOR 
& tokens)
  {
    pANTLR3_PARSER parser = ctx->pParser;
    pANTLR3_TOKEN_STREAM ts = parser->getTokenStream(parser);
    ANTLR3_INT32 tok;
    if( ! tokens)      // If we didn't have a token list, start one now
      tokens = ctx->vectors->newVector( ctx->vectors);
    if (LA(0) == SEMI) return false; // e.g. "COMMIT ;"
    while( ( tok=LA( 1) ) != EOF)
    {
      switch( tok)
      {
        case SEMI:       return true; 
        case EOF:        return false;
        default:
          tokens->add( tokens, LT( 1), NULL);
          ts->istream->consume( ts->istream);
          continue;
      }
    }
    return false;
  }
And a modified swallow_to_semi rule that looks like
     swallow_to_semi :  tokenlist+=( {semicolonMatch(ctx, $tokenlist) }? ) 
-> $tokenlist+
but that doesn't work correctly because it seems to preemptively swallow 
the SEMI and a statement like
        COMMIT;
fails.
This feels like something that should be relatively easy to do, but I 
don't seem to be able to figure out exactly how to make it happen and I 
haven't hit upon the right search terms to find an appropriate example in 
the Antlr-interest archives or the Wiki.
  
NOTICE  from Ab Initio: If received in error, please destroy and notify sender, and make no further use, disclosure, or distribution. This email (including attachments) may contain information subject to confidentiality obligations, and sender does not waive confidentiality or privilege.   
    
    
More information about the antlr-interest
mailing list