[antlr-interest] Unicode Support

Rowan Woodhouse rowan at querix.com
Wed Jul 5 03:46:37 PDT 2006


 Hi,

I've been looking through the archives/web site etc to try to figure this out but I haven't been able to come up with a definate answer, so hear goes.

I am looking at writing a lexer in c/c++ that can handle ascii or unicode encoded input files and allow the use of unicode characters for things such as string literals and identifiers. For example the source code could be:

function main
  define AAA integer
  define somename string
  AAA = 2
  somename = "BBBB"
  CCCC(AAA, somename)
end main

function CCCC(AAA, somename)
  if somename == "BABA"
    return AAA
  else
    return 0
end CCCC

where AAA, BBBB, CCCC and BABA are all chinese character strings.

Would it be possible to get Antlr to generate a C lexer for this? If not what of the above would be possible, ie just the string literals?

Many thanks,
Rowan


More information about the antlr-interest mailing list