[antlr-interest] v2->v3 Skip wrapper and inside quotes in LITERAL of SQL // C-target [SOLVED v3]

Ruslan Zasukhin ruslan_zasukhin at valentina-db.com
Mon Apr 18 10:09:10 PDT 2011


Hi Guys,

Below I copy paste my solution for LITERAL of our SQL grammar.

GOOD:

* all on LEXER level.
* uses effective way of GETCHARINDEX() +  EMIT() for most literals.
* only if was found QUOTE QUOTE  (rare case in life) then will be used
complex algorithm.

BAD:

    * I don¹t know yet if it needs to free pTmpStr  manually.
    * I don¹t know yet if this solution will work for UTF16 input of Lexer.
 
   * I have to use direct access to produced Token object to modify ITS text
copy. 

    * I still think that solution is much more NOT trivial comparing to !
Of ANTLR v2
    * solution is very target-oriented IMO.
         IMO: Ideal is ANTLR own syntax to control lexer¹s output

Anybody can give hints for better solution? Before offer ideas, please
carefully check 
STRING_LITERAL rule below:
    **Inside** of STRING_LITERAL should be possible QUOTE QUOTE
    and we should skip one of them.

Example:
     'aa¹¹bb¹¹cc''dd'   =>   aa¹bb¹cc¹dd


//-------------------------------------------------------------
// String literals:

fragment
LETTER               // caseSensitive = false, so we use only small chars.
    :    'a'..'z'
    |   '@'
    ;

fragment
ESCAPE_SEQUENCE                      // Escape for VSQL can be:  \'  \_  \%
    :    '\\' ( QUOTE | '_' | '%' )
    ;

STRING_LITERAL
@init
{
    int dquotes_count = 0;
    int theStart = $start;
}
    :    QUOTE    { theStart = GETCHARINDEX(); }
        (    ESCAPE_SEQUENCE
        |    ~('\'' | '\\')
        |    QUOTE QUOTE            { ++dquotes_count; }
        )* 
                { $start = theStart; EMIT(); }
        QUOTE 
        {
            if( dquotes_count > 0 ) // ONLY if was found ''
            {
                pANTLR3_COMMON_TOKEN pToken = LEXSTATE->token;
            
                pANTLR3_STRING pTmpStr = pToken->getText( pToken );
                char* pStart = (char*) pTmpStr->chars;
            
                while( dquotes_count-- ) // we make string smaller in the
same buffer.
                {
                    char* pFirstQuote = strchr( pStart, '\'' );
                   
                    if( *(pFirstQuote + 1) != '\'' ) // the second quote?
                        continue;
                   
                    // Example: 'aa¹¹bb¹¹cc''dd'   =>   aa¹bb¹cc¹dd
                    int CharsOnLeft   = pFirstQuote - pStart + 1;
                    int CharsToMove = pTmpStr->len - CharsOnLeft;
                   
                    ANTLR3_MEMMOVE( pFirstQuote + 1, pFirstQuote + 2,
CharsToMove );

                    // prepare for possible next loop:
                    pStart = pFirstQuote + 1;
                    pTmpStr->len--;
                }
            
                pToken->setText( pToken, pTmpStr );
            }
        }
    ;




More information about the antlr-interest mailing list