[antlr-interest] How to use arabic letters in my tokens ?

Ahmed Hamouda ahmedh at horizonssoftware.com
Wed Mar 26 13:54:59 PDT 2008


Thank you for reply

>when there's a contiguous range you can specify it like so:
   '\u00c0'..'\u00c7'

When I use the range, I receive a compiler error in the generated code that states no definition for the method of "MatchRange"

>And again, those don't appear to be Arabic 
>characters.  Run "charmap" and make sure you 
>switch it to Unicode mode.  You're probably 
>putting in the ANSI encodings from your Arabic codepage instead.

Sorry, I don't know what is "charmap", please make me know how to get the table about the Unicode of chars ?
Thank you

Best Regards

Ahmed Hamouda (MCTS)
Software Engineer
Horizons Software
Address: 93 Haroun Al Rasheed Street, Heliopolis, Cairo, Egypt. 11351. 
Tel:         +202-2644-3709
Mobile:    +2010-33-55-879
Fax:        +202-2632-0661
Website:   www.horizonssoftware.com


-----Original Message-----
From: Gavin Lambert [mailto:antlr at mirality.co.nz] 
Sent: Wednesday, March 26, 2008 10:38 PM
To: Ahmed Hamouda; antlr-interest at antlr.org
Subject: Re: [antlr-interest] How to use arabic letters in my tokens ?

At 08:25 27/03/2008, Ahmed Hamouda wrote:
>I want to define a tokens as all possible 
>letters that user can use
>These letters contain Arabic letters.
>I tried to add them by hand as the following 'Ç' 
>| 'È' | 'Ì'.... and so, on but I received an error in the generation

Firstly, those don't appear Arabic to me; just 
regular wider latin characters.  Secondly, you 
can't write Unicode characters directly in either 
ANTLRv2 or ANTLRv3 since ANTLRv2 doesn't support 
Unicode at all and ANTLRv3 still uses ANTLRv2 to 
parse the grammars themselves.  (ANTLRv3 grammars 
can recognise Unicode characters though.)

>I also tried to use these alternatives
>
>| '\u00c2' | '\u00c3' | '\u00c4' | '\u00c5' | 
>'\u00c6' | '\u00c7' | '\u00c8' | '\u00c9'
>                                 | '\u00c0' | 
> '\u00ca' | '\u00cb' | '\u00cc' | '\u00cd' |
[...]

First, when there's a contiguous range you can specify it like so:
   '\u00c0'..'\u00c7'

And again, those don't appear to be Arabic 
characters.  Run "charmap" and make sure you 
switch it to Unicode mode.  You're probably 
putting in the ANSI encodings from your Arabic codepage instead.



More information about the antlr-interest mailing list