[antlr-interest] Range generation in C++ mode

Martin Probst mail at martin-probst.com
Tue Sep 7 05:56:53 PDT 2004

> Could you verify that in the file String.cpp (line 41) of your support
> library the character fed to isprint is anded with 0xff ? I'm kindoff
> surprised that a negative value gets to isprint. It might be that you
> still
> link to an older support library ?

That's correct, it's not anded with 0xFF:
> if( isprint( ch ) )
Since when should the change be in the files? I've got String.cpp from the
antlr-2.7.4 distribution (no doubt there, the version number is within the
CVS $Id$ tag).

>> I'm actually quite content with the way I handle it at the moment. The
>> only problem I will get is proper error reporting (with respect to
>> column
>> numbers) but I'll either do this in a special error handler or with an
>> ugly client-side hack. Converting UTF-8 to UCS-2 or sth similar would
>> not
>> actually help me as I have to compare the string values of tokens to
>> UTF-8
>> strings later on.
> Well the actual result of the 'hack' would be that you still have UTF8 at
> the parser/treeparser level. (in all getText calls and similar) And have
> correct line/column information because the lexer internally uses 32 bit
> values.

Well, at the moment it works for me (tm). And in general it should even be
a tiny bit faster if I'm not mistaken... Is that 'hack' publicly available
somewhere? I would like to test it out ... the XQuery/XML specs use UCS-2
so it would be quite handy to have an UCS-2 aware parser generator that is
transparent to later UTF-8 string handling.


Yahoo! Groups Links

<*> To visit your group on the web, go to:

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:

More information about the antlr-interest mailing list