[antlr-interest] v2->v3 Skip chars in Lexer? Terrence?
Ruslan Zasukhin
ruslan_zasukhin at valentina-db.com
Sun Apr 17 01:06:25 PDT 2011
On 4/16/11 9:27 PM, "Jim Idle" <jimi at temporal-wave.com> wrote:
Hi Jim,
> It is for performance and has been talked about for 4 years, so we don't
> need to start it again.
Okay, but may be it is good idea to add code-example into that FAQ page
about this quotes?
http://www.antlr.org/wiki/pages/viewpage.action?pageId=1461
There is no C Target example on this page.
> If we implement ! then you have to build the
> string in to every token and copy it,
Not very clear but ok.
I have see in book it is possible to use labels in Lexer
IDENT: q1=DQUOTE something q2=DQUOTE
But how this helps? In book is shown useless example
Action with all labels
{ $q1, something.text $q2 }
I did think we can do some "re-write" in lexer, but nope
So what use of that is not clear.
> but basically it is easy to strip
> leading and trailing characters as the tokens carry pointers, so get the
> start pointer, increment it, get the end point, decrement it, now
>
> Do not use the built in $token.text->chars as this is slow and just for
> convenience.
> The token holds a pointer to the start of the text in the
> original input stream, which is greatly faster and you don¹t do anything
> at all to the token until and if you use it.
So I must check structure Token of C Target,
And I should find there two pointers start/end and correct them.
Ok clear, thank you, Jim.
> You know the token type, so can handle it appropriately.
Why I should care about type?
I should correct pointers at the end of lexer rule, right?
> It is a trivial piece of code and you will
> want a generic method/function for getting the string anyway. It takes
> less time to implement it than to worry about ! not being there any more
> :-)
Piece of code may be trivial, but it takes hours to lean your C code.
And this is where is problem IMO.
This is why again I ask you to add best of the best example into that FAQ
page. It should take 5 minutes only from you. And will help others.
Problem2:
you describe above effective solution only for skip FIRST/LAST quotes.
Good. But you could see that we need yet remove INTERNAL quote
and this task require creation of COPY of string from original input.
Right?
STRING_LITERAL
@init
{
int dquotes_count = 0;
}
: QUOTE
( ESCAPE_SEQUENCE
| ~('\'' | '\\')
| QUOTE QUOTE { ++dquotes_count; }
)*
QUOTE
{
// Remove the first and the last chars:
pANTLR3_STRING pQuotedStr = GETTEXT();
pANTLR3_STRING pStr = pQuotedStr->subString( pQuotedStr, 1,
pQuotedStr->len - 1 );
char* pStart = (char*) pStr->chars;
while( dquotes_count-- )
{
char* pFirstQuote = strchr( pStart, '\'' );
if( *(pFirstQuote + 1) != '\'' ) // second quote?
continue;
// Example: 'aabbcc''def'
int CharsOnLeft = pFirstQuote - pStart + 1;
int CharsToMove = pStr->len - CharsOnLeft;
ANTLR3_MEMMOVE( pFirstQuote + 1, pFirstQuote + 2,
CharsToMove );
// prepare for possible next loop:
pStart = pFirstQuote + 1;
pStr->len--;
}
SETTEXT( pStr );
}
;
--
Best regards,
Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc
Valentina - Joining Worlds of Information
http://www.paradigmasoft.com
[I feel the need: the need for speed]
More information about the antlr-interest
mailing list