[antlr-interest] Strange behaviour with '\379' character

Ric Klaren klaren at cs.utwente.nl
Mon Jun 23 02:43:33 PDT 2003


Hi,

On Sun, Jun 22, 2003 at 09:40:25PM +0200, berserksangr wrote:
> RK> I guess you're making a mistake while converting the hex values to octal ?
> RK> So as workaround you could try using octal. I'll check for the hex
> RK> constants tomorrow.
>
> So which way in my rules I could represent i.e. those unicode (looks like they are hex):
> 0x017A 0x017C

The C++ codegen does not support wide characters. Only bytewise input.
There is a MSVC centric patch floating around the archives that at least
does some of the stuff necessary for wide character support in C++.

> Do you mean, that I should convert them hex-> oct and then write
> '\OCT_VALUE', for example '\577'?

As far as I can tell from the docs and the code in antlr a character
literal can be \xxx for an octal value (char wide and x from [0-7]) or it
can be \uXXXX for 16 bit wide characters. I just tested the codegen in C++
and it leaves out a leading zeroes.

It looks like the lexer of antlr accepts incorrect octal values, and does
something 'funny' with them.

> Sorry for this lame question, but I usually write in Java with all
> unicode details running under the hood.

It's not that lame actually, had to check around a bit myself too ;)

Cheers,

Ric
--
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893722  ----
-----+++++*****************************************************+++++++++-------
  "I think we better split up."
  "Good idea. We can do more damage that way."
  --- Ghostbusters

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list