[antlr-interest] Re: Error: How to deal with Special characters?

Martin Probst mail at martin-probst.com
Mon Jul 25 04:56:42 PDT 2005


Hi,

> What interested me about the message from Prekumar of 24 July 
> was how in some source code a hyphen ("-") could become 
> displayed as a "u" circumflex ("û")in DOS mode when the ISO 
> 8859-1 value of the first is 45 and the second 251 (with a 
> difference of 206.

That's because MS Windows uses a different encoding in DOS boxes and
normal GUI apps (I think the first is windows-1252 and the second
something closer to ISO... but I'm not sure).

So the encoding problem on Windows is generally a MS problem. ANTLR can
handle any encoding, it's just a lot more convenient to use the default
because then you can edit your files directly.

> What happens when you use the hyphen for the subtraction 
> operator in your source code?

You'll get an error as these are different characters.

> What is the significance of it being in a comment?

Probably none, except that you would generally like to allow it (he had
a problem with charVocabulary).

> What coding system is being used in the source code?

C++ part uses chars, Java part uses UTF-16. But that doesn't really
matter, als you can of course match any byte value you like in the C++
mode (e.g. you don't have to write 'a' but can rather use 0x61). There
is also a C++ Unicode mode which uses int values.

Martin



More information about the antlr-interest mailing list