[antlr-interest] unicode support

Sriram Durbha cintyram at yahoo.com
Wed Dec 18 14:49:46 PST 2002


hi,
  if i were to write a parser based tool [ which im pretty much likely
to do] this spring/summer for an indian language [telugu]
i would want the following features ,
1. to be abel to clearly specify which charecter set and encoding  , i
assume the document is in, as you might know , there are quite a few
options here , and most documents possibly written by south indians
will have multiple languages , as  some of us who got educated in
english medium dont know the appropriate words to use in telugu :))

2. i am learning about how to render telugu and other fonts , looks
like the tool set will include freetype, some open type fonts, and ICU
;
   i want to allow programming in native languages .. all this resource
bundling and stuiff is way too much for simple applications which can
be handled other wise also , i mena if i know that my application will
be used by only telugu or only ORIYA ppl, all time i wnat to be able to
do it like i develop applications for english readers now ; and i want
a parser with which i can easily parse all the data files etc ..
 the problem with telugu and such languages is one character in telugu
is in most cases a composite of many unicode characters 
 example is ksha ; yeah thats one letter :)
or  shtra  this also one letter ; but it is almost impossible to write
all this in a lexer to be passed as one token, unless the document
format allows for character spacing characters too!! that would bloat
the document like crazy .. 
but once i figure out how to render the stuff on to the screen, i want
to be able to write an antlr grammar in a telugu editor and run java
antlr.Tool on that texttel.g file

this file can be read by a telugu knowing person , who has also learnt
antlr ;
but internally all the characters including the keywords etc will be
represented as unicode characters ..
since i will use such an editor i will expect it to set the appropriate
variables like charVocabulary for me, or atleast put them in a place
where i can see and copy , on how exactly that editor will know the
values .. may be we should follow one or one of all of the approaches
in teh preceeding discussion

cheers
ram



__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list