[antlr-interest] v2->v3 Skip wrapper and inside quotes in LITERAL of SQL // C-target [SOLVED v3]

Jim Idle jimi at temporal-wave.com
Mon Apr 18 10:25:46 PDT 2011


???

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Ruslan Zasukhin
> Sent: Monday, April 18, 2011 10:09 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] v2->v3 Skip wrapper and inside quotes in
> LITERAL of SQL // C-target [SOLVED v3]
>
> Hi Guys,
>
> Below I copy paste my solution for LITERAL of our SQL grammar.
>
> GOOD:
>
> * all on LEXER level.
> * uses effective way of GETCHARINDEX() +  EMIT() for most literals.
> * only if was found QUOTE QUOTE  (rare case in life) then will be used
> complex algorithm.
>
> BAD:
>
>     * I donąt know yet if it needs to free pTmpStr  manually.
>     * I donąt know yet if this solution will work for UTF16 input of
> Lexer.
>
>    * I have to use direct access to produced Token object to modify ITS
> text copy.
>
>     * I still think that solution is much more NOT trivial comparing to
> !
> Of ANTLR v2
>     * solution is very target-oriented IMO.
>          IMO: Ideal is ANTLR own syntax to control lexerąs output
>
> Anybody can give hints for better solution? Before offer ideas, please
> carefully check STRING_LITERAL rule below:
>     **Inside** of STRING_LITERAL should be possible QUOTE QUOTE
>     and we should skip one of them.
>
> Example:
>      'aaąąbbąącc''dd'   =>   aaąbbąccądd
>
>
> //-------------------------------------------------------------
> // String literals:
>
> fragment
> LETTER               // caseSensitive = false, so we use only small
> chars.
>     :    'a'..'z'
>     |   '@'
>     ;
>
> fragment
> ESCAPE_SEQUENCE                      // Escape for VSQL can be:  \'  \_
> \%
>     :    '\\' ( QUOTE | '_' | '%' )
>     ;
>
> STRING_LITERAL
> @init
> {
>     int dquotes_count = 0;
>     int theStart = $start;
> }
>     :    QUOTE    { theStart = GETCHARINDEX(); }
>         (    ESCAPE_SEQUENCE
>         |    ~('\'' | '\\')
>         |    QUOTE QUOTE            { ++dquotes_count; }
>         )*
>                 { $start = theStart; EMIT(); }
>         QUOTE
>         {
>             if( dquotes_count > 0 ) // ONLY if was found ''
>             {
>                 pANTLR3_COMMON_TOKEN pToken = LEXSTATE->token;
>
>                 pANTLR3_STRING pTmpStr = pToken->getText( pToken );
>                 char* pStart = (char*) pTmpStr->chars;
>
>                 while( dquotes_count-- ) // we make string smaller in
> the same buffer.
>                 {
>                     char* pFirstQuote = strchr( pStart, '\'' );
>
>                     if( *(pFirstQuote + 1) != '\'' ) // the second
> quote?
>                         continue;
>
>                     // Example: 'aaąąbbąącc''dd'   =>   aaąbbąccądd
>                     int CharsOnLeft   = pFirstQuote - pStart + 1;
>                     int CharsToMove = pTmpStr->len - CharsOnLeft;
>
>                     ANTLR3_MEMMOVE( pFirstQuote + 1, pFirstQuote + 2,
> CharsToMove );
>
>                     // prepare for possible next loop:
>                     pStart = pFirstQuote + 1;
>                     pTmpStr->len--;
>                 }
>
>                 pToken->setText( pToken, pTmpStr );
>             }
>         }
>     ;
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list