[antlr-interest] Additional char from LEXER->getText

Thu Aug 30 07:57:14 PDT 2012

Hi,

there seems to be a problem in the C-target lexer, which returns an additional char in getText.

I have this lexer rule:

UNDERSCORE_CHARSET:		UNDERLINE_SYMBOL LETTER_WHEN_UNQUOTED+ { $type = check_charset($text); };

For input like:

SELECT _utf8 'text'

I actually get the string "_utf8 ", which is not correct (I have the usual white space rule of course). I think either LEXER->getText itself is wrong (end pointer is one too far) or antlr38BitSubstr. Looking at the code of the latter I wonder why there's that +1. When I have a start and end pointer pointing to the same place in memory I would expect to get an empty string returned, not the single char at the start position.

I can work around this problem via pANTLR3_STRING->len - 1, but ...

Mike
-- 
www.soft-gems.net