[antlr-interest] v2->v3 Skip wrapper and inside quotes in LITERAL of SQL // C-target [SOLVED v3]
Jim Idle
jimi at temporal-wave.com
Mon Apr 18 10:25:46 PDT 2011
???
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Ruslan Zasukhin
> Sent: Monday, April 18, 2011 10:09 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] v2->v3 Skip wrapper and inside quotes in
> LITERAL of SQL // C-target [SOLVED v3]
>
> Hi Guys,
>
> Below I copy paste my solution for LITERAL of our SQL grammar.
>
> GOOD:
>
> * all on LEXER level.
> * uses effective way of GETCHARINDEX() + EMIT() for most literals.
> * only if was found QUOTE QUOTE (rare case in life) then will be used
> complex algorithm.
>
> BAD:
>
> * I donąt know yet if it needs to free pTmpStr manually.
> * I donąt know yet if this solution will work for UTF16 input of
> Lexer.
>
> * I have to use direct access to produced Token object to modify ITS
> text copy.
>
> * I still think that solution is much more NOT trivial comparing to
> !
> Of ANTLR v2
> * solution is very target-oriented IMO.
> IMO: Ideal is ANTLR own syntax to control lexerąs output
>
> Anybody can give hints for better solution? Before offer ideas, please
> carefully check STRING_LITERAL rule below:
> **Inside** of STRING_LITERAL should be possible QUOTE QUOTE
> and we should skip one of them.
>
> Example:
> 'aaąąbbąącc''dd' => aaąbbąccądd
>
>
> //-------------------------------------------------------------
> // String literals:
>
> fragment
> LETTER // caseSensitive = false, so we use only small
> chars.
> : 'a'..'z'
> | '@'
> ;
>
> fragment
> ESCAPE_SEQUENCE // Escape for VSQL can be: \' \_
> \%
> : '\\' ( QUOTE | '_' | '%' )
> ;
>
> STRING_LITERAL
> @init
> {
> int dquotes_count = 0;
> int theStart = $start;
> }
> : QUOTE { theStart = GETCHARINDEX(); }
> ( ESCAPE_SEQUENCE
> | ~('\'' | '\\')
> | QUOTE QUOTE { ++dquotes_count; }
> )*
> { $start = theStart; EMIT(); }
> QUOTE
> {
> if( dquotes_count > 0 ) // ONLY if was found ''
> {
> pANTLR3_COMMON_TOKEN pToken = LEXSTATE->token;
>
> pANTLR3_STRING pTmpStr = pToken->getText( pToken );
> char* pStart = (char*) pTmpStr->chars;
>
> while( dquotes_count-- ) // we make string smaller in
> the same buffer.
> {
> char* pFirstQuote = strchr( pStart, '\'' );
>
> if( *(pFirstQuote + 1) != '\'' ) // the second
> quote?
> continue;
>
> // Example: 'aaąąbbąącc''dd' => aaąbbąccądd
> int CharsOnLeft = pFirstQuote - pStart + 1;
> int CharsToMove = pTmpStr->len - CharsOnLeft;
>
> ANTLR3_MEMMOVE( pFirstQuote + 1, pFirstQuote + 2,
> CharsToMove );
>
> // prepare for possible next loop:
> pStart = pFirstQuote + 1;
> pTmpStr->len--;
> }
>
> pToken->setText( pToken, pTmpStr );
> }
> }
> ;
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address
More information about the antlr-interest
mailing list