[antlr-interest] v2->v3 Skip chars in Lexer? Terrence?

Sun Apr 17 01:06:25 PDT 2011

On 4/16/11 9:27 PM, "Jim Idle" <jimi at temporal-wave.com> wrote:

Hi Jim,

> It is for performance and has been talked about for 4 years, so we don't
> need to start it again.

Okay, but may be it is good idea to add code-example into that FAQ page
about this quotes?

    http://www.antlr.org/wiki/pages/viewpage.action?pageId=1461

There is no C Target example on this page.

>  If we implement ! then you have to build the
> string in to every token and copy it,

Not very clear but ok.

I have see in book it is possible to use labels in Lexer
    IDENT:  q1=DQUOTE  something  q2=DQUOTE

But how this helps? In book is shown useless example
Action with  all labels
        { $q1, something.text $q2 }

I did think we can do some "re-write" in lexer, but nope
So what use of that is not clear.

> but basically it is easy to strip
> leading and trailing characters as the tokens carry pointers, so get the
> start pointer, increment it, get the end point, decrement it, now
> 
> Do not use the built in $token.text->chars as this is slow and just for
> convenience. 

> The token holds a pointer to the start of the text in the
> original input stream, which is greatly faster and you don¹t do anything
> at all to the token until and if you use it.

So I must check structure Token of C Target,
And I should find there two pointers start/end and correct them.

Ok clear, thank you, Jim.

> You know the token type, so can handle it appropriately.

Why I should care about type?

I should correct pointers at the end of lexer rule, right?

> It is a trivial piece of code and you will
> want a generic method/function for getting the string anyway. It takes
> less time to implement it than to worry about ! not being there any more
> :-)

Piece of code may be trivial, but it takes hours to lean your C code.
And this is where is problem IMO.

This is why again I ask you to add best of the best example into that FAQ
page. It should take 5 minutes only from you. And will help others.

Problem2:  
    you describe above effective solution only for skip FIRST/LAST quotes.
    Good.   But you could see that we need yet remove INTERNAL quote
    and this task require creation of COPY of string from original input.
    Right?

STRING_LITERAL
@init
{
    int dquotes_count = 0;
}
    :    QUOTE 
        (    ESCAPE_SEQUENCE
        |    ~('\'' | '\\')
        |    QUOTE QUOTE            { ++dquotes_count; }
        )* 
        QUOTE 

        {
            // Remove the first and the last chars:
            pANTLR3_STRING pQuotedStr = GETTEXT();
            pANTLR3_STRING pStr = pQuotedStr->subString( pQuotedStr, 1,
pQuotedStr->len - 1 );

            char* pStart = (char*) pStr->chars;

            while( dquotes_count-- )
            {
                char* pFirstQuote = strchr( pStart, '\'' );

                if( *(pFirstQuote + 1) != '\'' ) // second quote?
                    continue;

                // Example: 'aabbcc''def'
                int CharsOnLeft = pFirstQuote - pStart + 1;
                int CharsToMove = pStr->len - CharsOnLeft;

                ANTLR3_MEMMOVE( pFirstQuote + 1, pFirstQuote + 2,
CharsToMove );

                // prepare for possible next loop:
                pStart = pFirstQuote + 1;
                pStr->len--;
            }

            SETTEXT( pStr );
        }
    ;

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]