[antlr-interest] Parsing files in different charsets

charon_hades charon_hades at yahoo.com
Mon Sep 30 23:05:49 PDT 2002


   Hi,

how can I make ANTLR parse files containing different codepage as is 
its system codepage. For clarity, my system codepage is Cp1250, files 
contains string in Cp852 though identifiers are just from plain ASCII.
My problem is, that strings returned from calling getText method 
contains unrecognized characters.

If is enough to provide ANTLRLexer java.io.Reader reading in Cp852 ? 
How will be string tokens encoded after I will call getText on them ? 
If I am correct, then with these settings all charcters and tokens 
listed in grammar files have to be written in Cp852 ?

Or better way is to translate whole input stream into UTF8 and in 
this codeset also write grammar file ?

Thanks.


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list