[antlr-interest] MSVC 7.0

Ric Klaren klaren at cs.utwente.nl
Fri Oct 3 02:30:48 PDT 2003


Hi,

On Thu, Oct 02, 2003 at 05:18:04PM -0000, Arnar Birgisson wrote:
> STADVAER : "staðvær";
> 
> I recall that the ANTLR documentation states that the inputCharset for
> it's metalanguage is 7-bit ascii, so according to that, ANTLR should
> have yielded an error for this rule. However, this was translated
> directly to a string constant in the C++ file. (Note: this works fine in
> Java)

Think it was upped to 8 bit at a point. The documentation is crappy though.

> Then, somewhere along the way, the expected character becomes an int,
> and should be 0x000000f0, but is generated as 0xfffffff0. When 0xf0 is
> seen on the input, this causes a MismatchedCharException and it tries to
> generate it's message, it calls charName for 0xfffffff0 (the expected
> char), which in turn calls isprint and since 0xfffffff0 is negative, it
> blows up.

Hmm I had expected a problem like this one of these days. The runtime uses
a lot of int's where it should be using unsigneds just to prevent these
signextension troubles. I started changing some of them already only this
part I did not touch yet.

> I guess this is partly my fault since I didn't follow ANTLR's
> documentation carefully enough. Changing the rule to
> 
> STADVAER	:	"sta\360v\346r";
> 
> seems to fix this (it's butt-ugly though :o). However, I would like to
> point out that this worked in Java, with it's multibyte string
> constants, and along they way, antlr.Tool never complained about it's
> input.

antlr.Tool often does not complain where it should ;) To get back to the
point I thought I had fixed all these quoting stuffs in C++ codegen. Seems
I'll have to reinvestigate.

> I'm sending this in here mostly to be of reference to others lexing
> non-7bit ascii data with a c++ lexer, in case the hit the same walls I
> did.

8 bit should work fine except for some caveats. I will look at this.
Multibyte encodings is a no go though in C++.

Cheers,

Ric
--
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893722  ----
-----+++++*****************************************************+++++++++-------
     "Never argue with an idiot, for they will bring you down to their
              level and beat you with experience." --- Unknown

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list