[antlr-interest] c++ Unicode

Peggy Fieland madcapmaggie at yahoo.com
Mon Apr 24 12:41:50 PDT 2006


I get my input in UTF-8. I am using ANTLR 2.7.5 and
have modified it to handle utf-8 using the example in
the examples directory as a model.  

I don't know what the state of C++ unicode suport is
in antlr-2.7.6

Peggy

--- pepone pepone <pepone.onrez at gmail.com> wrote:

> Hi all
> 
> I have same problems to get UNICODE working with c++
> 
> I change my lexer rules as Peggy say and now my
> lexer and parser
> compile OK, but it don't work as expected.
> 
> in cpp/examples/unicode test.in is encoding as
> ISO-8859-1 and not as
> utf-8, is not posible to have input encoding as
> UTF-8 or UTF-16?
> 
> if i encoding my input as ISO-8859-1 i don't see the
> same characters
> in input that in ouput
> 
> for example if i input '�'  this give me '� ' 
this
> happen with my
> lexer as well with unicode example.
> 
> what is the way to get this working, i want to input
> '�' a view '�' in
> the ouput.
> 
> Any ideas to solve this.
> 
> On 4/23/06, Peggy Fieland <madcapmaggie at yahoo.com>
> wrote:
> > Yes, if you have something like:
> >
> > GE: ">="
> >
> > you'll have to change it to:
> >
> > GE:  'G''E'
> >
> > There may be another way, but that one worked for
> me.
> >
> >
> >
> > --- pepone pepone <pepone.onrez at gmail.com> wrote:
> >
> > >  I trying to add Unicode suport to my lexer
> based on
> > > example/cpp/unicode,
> > >
> > > when add the unicode char vocabulary
> > > charVocabulary='\u0000'..'\uFFFE';
> > >
> > >
> > > When try to compile the Lexer i get the next
> error:
> > >
> > >
> > > WikiLexer.cpp: In member function `void
> > > WikiLexer::mDOCUMENT(bool)':
> > > WikiLexer.cpp:192: error: invalid conversion
> from
> > > `const wchar_t*' to `unsigned
> > >    int'
> > > WikiLexer.cpp:192: error:   initializing
> argument 1
> > > of `
> > >    antlr::BitSet::BitSet(unsigned int)'
> > > WikiLexer.cpp: In member function `void
> > > WikiLexer::mSECTION_1_TAG(bool)':
> > > WikiLexer.cpp:206: error: invalid conversion
> from
> > > `const wchar_t*' to `unsigned
> > >
> > >
> > > Any ideas
> > > Thanks
> > > --
> > > play tetris http://pepone.on-rez.com/tetris
> > > run gentoo http://gentoo-notes.blogspot.com/
> > >
> >
> >
> 
> 
> --
> play tetris http://pepone.on-rez.com/tetris
> run gentoo http://gentoo-notes.blogspot.com/
> 



More information about the antlr-interest mailing list