[antlr-interest] collecting tokens without invoking parser rules...
Alan Lehotsky
ALehotsky at ABINITIO.COM
Mon Jan 17 13:40:02 PST 2011
Using Antlr 3.2 with language=C as a target
For parsing Teradata's stored-procedure language (SPL), we have the issue
of context-sensitive token hiding.
I'm trying to use rules for SQL statements embedded in SPL that just
swallow the tokens, so we have rules like:
swallow_to_semi : ~ ( SEMI ) * ;
update_stmt : UPDATE swallow_to_semi;
We take the stream of tokens from this UPDATE rule and pass them off to an
existing SQL parser.
But, because SPL has an assignment statement rule that looks like
assignment_stmt : SET dotted_name '=' expression SEMI;
and teradata SQL uses 'SET' within its own grammar, when I encounter a
source statement like
update mytable set x = y, a = b where a = 'none' ;
I get an error that makes it clear to me that the Antlr parser is 'seeing'
the 'set' and trying to invoke the assignment_stmt rule.
because the complaint is about expecting a "SEMI" at the source position
where the comma is.
I don't think that redirecting EVERYTHING in the lexer after the UPDATE to
an alternate channel will work in all cases, because there are other
context sensitivities in play - for example:
SELECT has to read everything to a SEMI when it appears in a statement
context, but when there is a select clause in a FOR statement, it must
read upto a USING, FOR, DO or SEMI token.
So, what I tried so far was code that looks like
static ANTLR3_BOOLEAN semicolonMatch ( pplsqlParser ctx, pANTLR3_VECTOR
& tokens)
{
pANTLR3_PARSER parser = ctx->pParser;
pANTLR3_TOKEN_STREAM ts = parser->getTokenStream(parser);
ANTLR3_INT32 tok;
if( ! tokens) // If we didn't have a token list, start one now
tokens = ctx->vectors->newVector( ctx->vectors);
if (LA(0) == SEMI) return false; // e.g. "COMMIT ;"
while( ( tok=LA( 1) ) != EOF)
{
switch( tok)
{
case SEMI: return true;
case EOF: return false;
default:
tokens->add( tokens, LT( 1), NULL);
ts->istream->consume( ts->istream);
continue;
}
}
return false;
}
And a modified swallow_to_semi rule that looks like
swallow_to_semi : tokenlist+=( {semicolonMatch(ctx, $tokenlist) }? )
-> $tokenlist+
but that doesn't work correctly because it seems to preemptively swallow
the SEMI and a statement like
COMMIT;
fails.
This feels like something that should be relatively easy to do, but I
don't seem to be able to figure out exactly how to make it happen and I
haven't hit upon the right search terms to find an appropriate example in
the Antlr-interest archives or the Wiki.
NOTICE from Ab Initio: If received in error, please destroy and notify sender, and make no further use, disclosure, or distribution. This email (including attachments) may contain information subject to confidentiality obligations, and sender does not waive confidentiality or privilege.
More information about the antlr-interest
mailing list