[antlr-interest] Does ANTLR exactly allow Unicode?

新买 inshua at gmail.com
Sun Oct 22 05:44:20 PDT 2006


Hi Tommy,

Thank you very much,the problem is resolved.

And by the way, in ANTLR2.7.6 Unicode cannot use double quote,  must sperate
into char one by one : as '\u5f00''\u59cb'.

Thanks.


On 10/22/06, Tommy Nordgren <tommy.nordgren at chello.se> wrote:
>
>
> On 22 okt 2006, at 11.59, 新买 wrote:
>
> > I had created a simple grammar to study ANTLR. and use Chinese
> > charater as letter, and ANTLR throws no warning or error.
> > However, when I input a piece of demo stream,like below:
> >
> > 开始
> > 输出 "开始开始";
> > 结束
> >
> > it report some aweful error.
> > line 1:1: unexpected char: 0xBF
> >  at LearnLexer.nextToken(LearnLexer.java:102)
> >  at antlr.TokenBuffer.fill(TokenBuffer.java:69)
> >  at antlr.TokenBuffer.LT(TokenBuffer.java:86)
> >  at antlr.LLkParser.LT(LLkParser.java :56)
> >  at LearnParser.multiWriteStatement(LearnParser.java:89)
> >  at Test.main(Test.java:18)
> >
> > Trace the lexer, I found an interesting thing. the char "开" is
> > "\u5f00", but it report with 0xBF.
> > Somebody tell me how use Unicode by ANTLR exactly,  thanks a lot.
>        You need to set up your input (character) stream to use the correct
> encoding when
> converting from it's input (byte) stream
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061022/b2e4e088/attachment-0001.html 


More information about the antlr-interest mailing list