[antlr-interest] Unicode Support

Prashant Deva prashant.deva at gmail.com
Wed Jul 5 13:24:08 PDT 2006


Hi Rowman,
  Look at the docs for a 'charVocabulary' option in the lexer to specify a
unicode char range in the lexer.

-- 
Prashant Deva
Creator, ANTLR Studio
Founder, Placid Systems, www.placidsystems.com

On 7/5/06, Rowan Woodhouse <rowan at querix.com> wrote:
>
> Hi,
>
> I've been looking through the archives/web site etc to try to figure this
> out but I haven't been able to come up with a definate answer, so hear goes.
>
> I am looking at writing a lexer in c/c++ that can handle ascii or unicode
> encoded input files and allow the use of unicode characters for things such
> as string literals and identifiers. For example the source code could be:
>
> function main
>   define AAA integer
>   define somename string
>   AAA = 2
>   somename = "BBBB"
>   CCCC(AAA, somename)
> end main
>
> function CCCC(AAA, somename)
>   if somename == "BABA"
>     return AAA
>   else
>     return 0
> end CCCC
>
> where AAA, BBBB, CCCC and BABA are all chinese character strings.
>
> Would it be possible to get Antlr to generate a C lexer for this? If not
> what of the above would be possible, ie just the string literals?
>
> Many thanks,
> Rowan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060706/dc6fa8bb/attachment.html


More information about the antlr-interest mailing list