[antlr-interest] v2->v3 Skip chars in Lexer? For C-target [SOLVED 2.5]
Jim Idle
jimi at temporal-wave.com
Sun Apr 17 08:37:35 PDT 2011
Why do you have to copy the token? You just pass a pointer to it, and when
you want the text, use the pointers in the token.
You solution is fine, but I don't think it works in all cases of
fragments, but cannot remember why just now. There are solutions in
antlr.markmail.org
Jim
> -----Original Message-----
> From: Ruslan Zasukhin [mailto:ruslan_zasukhin at valentina-db.com]
> Sent: Sunday, April 17, 2011 5:38 AM
> To: antlr-interest at antlr.org; Jim Idle
> Subject: Re: [antlr-interest] v2->v3 Skip chars in Lexer? For C-target
> [SOLVED 2.5]
>
> Hi All,
>
> After Jim points to more effective way skip wrapper-quotes, And some
> more time, this is working solution for archive:
>
> //--------------------------------------------------------------------
> IDENT
> : ( LETTER | '_' ) ( LETTER | '_' | DIGIT )*
> ;
>
> // RZ 04/17/11: in ANTLR v3 there is no way skip chars in lexer. Oops.
> // Instead we do trick suggest by Jim Idle on ANTLR list:
> // skip first/last chras of token on the parser level.
> //
> DELIMITED // delimited_identifier
> :
> ( DQUOTE ( ~(DQUOTE) | DQUOTE DQUOTE )+ DQUOTE
> | BQUOTE ( ~(BQUOTE) | BQUOTE BQUOTE )+ BQUOTE
> | LBRACK ( ~(']') )+ RBRACK
> )
> ;
>
>
> And on the parser level, we use Token and its pointers to ++ / -- Also
> type of Token is changed to IDENT with help of re-write.
>
>
> //--------------------------------------------------------------------
> identifier
> : IDENT // regular_identifier
>
> | d=DELIMITED // delimited_identifier
> {
> ++$d->start;
> --$d->stop;
> }
> -> ^( IDENT[$d.text->chars] )
> ;
>
>
>
> ================
> Works... But ...
> I am far not sure that this solution is really more effective, Jim.
>
> Yes, on lexer level I have use ->chars, and you say it is slower ...
>
> But on parser level, except to fast ++ / -- operations, we need yet
> create second token IDENT and copy all values from the first ...
>
> Sizeof( ANTLR3_COMMON_TOKEN_struct) is about 160-200 bytes.
>
> So creation by new and copy about 150 bytes to skip TWO chars not looks
> so cheap operation. Also note that IDENTs usually 5-20 chars only.
> Much less of 200 bytes of that structure.
>
>
> And may be my first solution with Lexer level was not so bad?
>
> And I still have TODO: skip chars inside of LITERAL on parser level
> ...
> here we cannot do just ++ \ --
>
>
> ================
> I do not see yet the whole picture how works lexer on low level in C.
>
> Also I do not see yet any clean information about UTF encodings in C-
> target.
> I am going ask about this in future letters.
>
>
> --
> Best regards,
>
> Ruslan Zasukhin
> VP Engineering and New Technology
> Paradigma Software, Inc
>
> Valentina - Joining Worlds of Information http://www.paradigmasoft.com
>
> [I feel the need: the need for speed]
>
More information about the antlr-interest
mailing list