[antlr-interest] Additional char from LEXER->getText

Jim Idle jimi at temporal-wave.com
Thu Aug 30 11:18:54 PDT 2012


Actually, those routines are really only there for convenience. You will
find them too slow and and cumbersome for any complicated tasks. It is
better to use the pointer to the input stream directly and avoid any
copying and malloc() calls.

However is this because you have a UTF8 input but are using the 8 bit
input stream?

Jim


> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Mike Lischke
> Sent: Thursday, August 30, 2012 7:57 AM
> To: ANTLR Mailing List
> Subject: [antlr-interest] Additional char from LEXER->getText
>
> Hi,
>
> there seems to be a problem in the C-target lexer, which returns an
> additional char in getText.
>
> I have this lexer rule:
>
> UNDERSCORE_CHARSET:		UNDERLINE_SYMBOL LETTER_WHEN_UNQUOTED+ {
$type
> = check_charset($text); };
>
> For input like:
>
> SELECT _utf8 'text'
>
> I actually get the string "_utf8 ", which is not correct (I have the
> usual white space rule of course). I think either LEXER->getText itself
> is wrong (end pointer is one too far) or antlr38BitSubstr. Looking at
> the code of the latter I wonder why there's that +1. When I have a
> start and end pointer pointing to the same place in memory I would
> expect to get an empty string returned, not the single char at the
> start position.
>
> I can work around this problem via pANTLR3_STRING->len - 1, but ...
>
> Mike
> --
> www.soft-gems.net
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list