[antlr-interest] How to use arabic letters in my tokens ?

Gavin Lambert antlr at mirality.co.nz
Wed Mar 26 13:38:16 PDT 2008


At 08:25 27/03/2008, Ahmed Hamouda wrote:
>I want to define a tokens as all possible 
>letters that user can use
>These letters contain Arabic letters.
>I tried to add them by hand as the following ‘Ç’ 
>| ‘È’ | ‘Ì’
. and so, on but I received an error in the generation

Firstly, those don't appear Arabic to me; just 
regular wider latin characters.  Secondly, you 
can't write Unicode characters directly in either 
ANTLRv2 or ANTLRv3 since ANTLRv2 doesn't support 
Unicode at all and ANTLRv3 still uses ANTLRv2 to 
parse the grammars themselves.  (ANTLRv3 grammars 
can recognise Unicode characters though.)

>I also tried to use these alternatives
>
>| '\u00c2' | '\u00c3' | '\u00c4' | '\u00c5' | 
>'\u00c6' | '\u00c7' | '\u00c8' | '\u00c9'
>                                 | '\u00c0' | 
> '\u00ca' | '\u00cb' | '\u00cc' | '\u00cd' |
[...]

First, when there's a contiguous range you can specify it like so:
   '\u00c0'..'\u00c7'

And again, those don't appear to be Arabic 
characters.  Run "charmap" and make sure you 
switch it to Unicode mode.  You're probably 
putting in the ANSI encodings from your Arabic codepage instead.



More information about the antlr-interest mailing list