[antlr-interest] Eliminate characters in TOKEN

Ruslan Zasukhin ruslan_zasukhin at valentina-db.com
Wed Nov 23 08:05:48 PST 2011


On 11/23/11 11:59 AM, "Bart Kiers" <bkiers at gmail.com> wrote:

> Hi Rampon,
> 
> 
> On Wed, Nov 23, 2011 at 10:54 AM, Rampon Jerome <ramponjerome at yahoo.fr>wrote:
> 
>> ...
>> it complained on output option to be AST.
>> If I add it in my grammar options if complains and still return error
>> It seems it automatically adds if not there but later on still return
>> error ???
>> 
>> Is that normal ?
>> 
> 
> Yes, the `!` to exclude characters from lexer rules (as was possible in v2)
> is no longer valid in v3 grammars.

Yes, I also was in face to this change in v3.
This is examples from our Valentina SQL grammar where we use new trick to
avoid e.g. Wrapper quotes


//--------------------------------------------------------------------------
----
// String literals:

// caseSensitive = false, so we use only small chars.
fragment
Letter
    :    'a'..'z'
    |   '@'
    ;


fragment
EscapeSequence
    :    '\\' ( QUOTE|'\\'|'b'|'t'|'n'|'f'|'r' )
    ;


STRING_LITERAL
@init
{
    int escape_count = 0;
    int theStart = $start;
}
    :    QUOTE    
    
            { theStart = GETCHARINDEX(); }     // skip first quote
            
                (    EscapeSequence            { ++escape_count; }
                |    QUOTE QUOTE               { ++escape_count; }
                |    ~( QUOTE | '\\' )
                )* 
            
            { 
                $start = theStart;
                EMIT();
                
                // Optimization: lexer have found escaped chars, and we even
count them.
                // We pass this info into parser/tree parser inside of
token,
                // so later algorithms can avoid one more scan of literal to
check if 
                // exists any symbols to unescape. Also knowing how much
such symbols
                // Alg can do immediate return when all known escapes
resolved ...
                // Also this can help accurately calculate RAM for unescaped
string.
                //
                LTOKEN->user1 = escape_count;
            }    
        
        QUOTE // and skip last quote
    ;





//-----------------------------------------------------------------------
IDENT
    :    ( Letter | '_' ) ( Letter | '_' | Digit )*
    ;
    

DELIMITED        // delimited_identifier
@init
{
    $type = IDENT;
    int theStart = $start;
}
    :
    (    DQUOTE    { theStart = GETCHARINDEX(); }
            ( ~(DQUOTE) | DQUOTE DQUOTE )+
                { $start = theStart; EMIT(); }
        DQUOTE

    |    BQUOTE    { theStart = GETCHARINDEX(); }
            ( ~(BQUOTE) | BQUOTE BQUOTE )+
                { $start = theStart; EMIT(); }
        BQUOTE

        // valentina/oracle extension: [asasas '' " sd "]
    |    LBRACK    { theStart = GETCHARINDEX(); }
            ( ~(']') )+
                { $start = theStart; EMIT(); }
        RBRACK
    )            
    ;




-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



More information about the antlr-interest mailing list