[antlr-interest] Parsing files in different charsets
    charon_hades 
    charon_hades at yahoo.com
       
    Mon Sep 30 23:05:49 PDT 2002
    
    
  
   Hi,
how can I make ANTLR parse files containing different codepage as is 
its system codepage. For clarity, my system codepage is Cp1250, files 
contains string in Cp852 though identifiers are just from plain ASCII.
My problem is, that strings returned from calling getText method 
contains unrecognized characters.
If is enough to provide ANTLRLexer java.io.Reader reading in Cp852 ? 
How will be string tokens encoded after I will call getText on them ? 
If I am correct, then with these settings all charcters and tokens 
listed in grammar files have to be written in Cp852 ?
Or better way is to translate whole input stream into UTF8 and in 
this codeset also write grammar file ?
Thanks.
 
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 
    
    
More information about the antlr-interest
mailing list