[antlr-interest] v2->v3 Skip wrapper and inside quotes in LITERAL of SQL // C-target [SOLVED v3]
Ruslan Zasukhin
ruslan_zasukhin at valentina-db.com
Mon Apr 18 10:09:10 PDT 2011
Hi Guys,
Below I copy paste my solution for LITERAL of our SQL grammar.
GOOD:
* all on LEXER level.
* uses effective way of GETCHARINDEX() + EMIT() for most literals.
* only if was found QUOTE QUOTE (rare case in life) then will be used
complex algorithm.
BAD:
* I don¹t know yet if it needs to free pTmpStr manually.
* I don¹t know yet if this solution will work for UTF16 input of Lexer.
* I have to use direct access to produced Token object to modify ITS text
copy.
* I still think that solution is much more NOT trivial comparing to !
Of ANTLR v2
* solution is very target-oriented IMO.
IMO: Ideal is ANTLR own syntax to control lexer¹s output
Anybody can give hints for better solution? Before offer ideas, please
carefully check
STRING_LITERAL rule below:
**Inside** of STRING_LITERAL should be possible QUOTE QUOTE
and we should skip one of them.
Example:
'aa¹¹bb¹¹cc''dd' => aa¹bb¹cc¹dd
//-------------------------------------------------------------
// String literals:
fragment
LETTER // caseSensitive = false, so we use only small chars.
: 'a'..'z'
| '@'
;
fragment
ESCAPE_SEQUENCE // Escape for VSQL can be: \' \_ \%
: '\\' ( QUOTE | '_' | '%' )
;
STRING_LITERAL
@init
{
int dquotes_count = 0;
int theStart = $start;
}
: QUOTE { theStart = GETCHARINDEX(); }
( ESCAPE_SEQUENCE
| ~('\'' | '\\')
| QUOTE QUOTE { ++dquotes_count; }
)*
{ $start = theStart; EMIT(); }
QUOTE
{
if( dquotes_count > 0 ) // ONLY if was found ''
{
pANTLR3_COMMON_TOKEN pToken = LEXSTATE->token;
pANTLR3_STRING pTmpStr = pToken->getText( pToken );
char* pStart = (char*) pTmpStr->chars;
while( dquotes_count-- ) // we make string smaller in the
same buffer.
{
char* pFirstQuote = strchr( pStart, '\'' );
if( *(pFirstQuote + 1) != '\'' ) // the second quote?
continue;
// Example: 'aa¹¹bb¹¹cc''dd' => aa¹bb¹cc¹dd
int CharsOnLeft = pFirstQuote - pStart + 1;
int CharsToMove = pTmpStr->len - CharsOnLeft;
ANTLR3_MEMMOVE( pFirstQuote + 1, pFirstQuote + 2,
CharsToMove );
// prepare for possible next loop:
pStart = pFirstQuote + 1;
pTmpStr->len--;
}
pToken->setText( pToken, pTmpStr );
}
}
;
More information about the antlr-interest
mailing list