[antlr-interest] Parsing files in different charsets
charon_hades
charon_hades at yahoo.com
Mon Sep 30 23:05:49 PDT 2002
Hi,
how can I make ANTLR parse files containing different codepage as is
its system codepage. For clarity, my system codepage is Cp1250, files
contains string in Cp852 though identifiers are just from plain ASCII.
My problem is, that strings returned from calling getText method
contains unrecognized characters.
If is enough to provide ANTLRLexer java.io.Reader reading in Cp852 ?
How will be string tokens encoded after I will call getText on them ?
If I am correct, then with these settings all charcters and tokens
listed in grammar files have to be written in Cp852 ?
Or better way is to translate whole input stream into UTF8 and in
this codeset also write grammar file ?
Thanks.
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list