[antlr-interest] v2->v3 Skip chars in Lexer? Terrence?

Jim Idle jimi at temporal-wave.com
Sun Apr 17 08:32:59 PDT 2011


The lexer CREATES the token, so you do not access the token itself.


Don't forget antlr.markmail.org:

http://markmail.org/message/izyhuzbooerfw4tu


Don't deal with it in the parser, use a lexer rule for single quotes, then
either use the trick below or just set the text of the token. If you know
that you will access the token then setting the text is no problem, but
the code below is easier.


 STRING_LITERAL
 @init
 {
     int    theStart = $start;
 }
         : '\''
             {
                 // Start the token text here, when we don't want the
opening '
                 //
                 theStart = getCharIndex();

             }

              ~'\''*

             { $start = theStart; // There are other variants of this
			EMIT();
			}
             '\''
          ;


you could also look at the source code, or create your own token creation
method that just auto removed the quotes for some tokens and probably a
few other solutions.

If performance is your preference, do not use $XXX.text

Jim



> -----Original Message-----
> From: Ruslan Zasukhin [mailto:ruslan_zasukhin at valentina-db.com]
> Sent: Sunday, April 17, 2011 1:24 AM
> To: Ruslan Zasukhin; Jim Idle; antlr-interest at antlr.org
> Subject: Re: [antlr-interest] v2->v3 Skip chars in Lexer? Terrence?
>
> On 4/17/11 11:06 AM, "Ruslan Zasukhin" <ruslan_zasukhin at valentina-
> db.com>
> wrote:
>
> >> but basically it is easy to strip
> >> leading and trailing characters as the tokens carry pointers, so get
> >> the start pointer, increment it, get the end point, decrement it,
> now
> >>
> >> Do not use the built in $token.text->chars as this is slow and just
> >> for convenience.
>
> >> The token holds a pointer to the start of the text in the original
> >> input stream, which is greatly faster and you donšt do anything at
> >> all to the token until and if you use it.
>
> >> You know the token type, so can handle it appropriately.
>
> Hmmm,
>
> I have take a look, and I do not see way in C-target access token in
> lexer rule.
>
> Do you mean that I should care about these pointers LATER, in parser?
>
> Butt hen this again looks as not best solution...
>     Java developers will remove them in lexer,
>     C developers in parser?
>
> Some kind of Zoo ...
>
>
> Please help   :-)
>
> And note, that I am C++ developer with 20 years of experience,
>     do all my best reading ANTLR WIKI and book,and examples,
>     and which did work with ANTLR v2 for 10 years ...
>     cannot resolve this *trivial* task in *the best way*
>     for v3 for about 14 hours now.
>
> I wonder how other C developers was able resolve this problem?
>
> And may be docs, faqs, examples can be improved in this direction?
> Thank you, in advance :-)
>
> --
> Best regards,
>
> Ruslan Zasukhin
> VP Engineering and New Technology
> Paradigma Software, Inc
>
> Valentina - Joining Worlds of Information http://www.paradigmasoft.com
>
> [I feel the need: the need for speed]
>


More information about the antlr-interest mailing list