[antlr-interest] How to use arabic letters in my tokens ?
Gavin Lambert
antlr at mirality.co.nz
Wed Mar 26 13:38:16 PDT 2008
At 08:25 27/03/2008, Ahmed Hamouda wrote:
>I want to define a tokens as all possible
>letters that user can use
>These letters contain Arabic letters.
>I tried to add them by hand as the following Ç
>| È | Ì
. and so, on but I received an error in the generation
Firstly, those don't appear Arabic to me; just
regular wider latin characters. Secondly, you
can't write Unicode characters directly in either
ANTLRv2 or ANTLRv3 since ANTLRv2 doesn't support
Unicode at all and ANTLRv3 still uses ANTLRv2 to
parse the grammars themselves. (ANTLRv3 grammars
can recognise Unicode characters though.)
>I also tried to use these alternatives
>
>| '\u00c2' | '\u00c3' | '\u00c4' | '\u00c5' |
>'\u00c6' | '\u00c7' | '\u00c8' | '\u00c9'
> | '\u00c0' |
> '\u00ca' | '\u00cb' | '\u00cc' | '\u00cd' |
[...]
First, when there's a contiguous range you can specify it like so:
'\u00c0'..'\u00c7'
And again, those don't appear to be Arabic
characters. Run "charmap" and make sure you
switch it to Unicode mode. You're probably
putting in the ANSI encodings from your Arabic codepage instead.
More information about the antlr-interest
mailing list