[antlr-interest] Does ANTLR exactly allow Unicode?
Tommy Nordgren
tommy.nordgren at chello.se
Sun Oct 22 04:48:10 PDT 2006
On 22 okt 2006, at 11.59, 新买 wrote:
> I had created a simple grammar to study ANTLR. and use Chinese
> charater as letter, and ANTLR throws no warning or error.
> However, when I input a piece of demo stream,like below:
>
> 开始
> 输出 "开始开始";
> 结束
>
> it report some aweful error.
> line 1:1: unexpected char: 0xBF
> at LearnLexer.nextToken(LearnLexer.java:102)
> at antlr.TokenBuffer.fill(TokenBuffer.java:69)
> at antlr.TokenBuffer.LT(TokenBuffer.java:86)
> at antlr.LLkParser.LT(LLkParser.java :56)
> at LearnParser.multiWriteStatement(LearnParser.java:89)
> at Test.main(Test.java:18)
>
> Trace the lexer, I found an interesting thing. the char "开" is
> "\u5f00", but it report with 0xBF.
> Somebody tell me how use Unicode by ANTLR exactly, thanks a lot.
You need to set up your input (character) stream to use the correct
encoding when
converting from it's input (byte) stream
More information about the antlr-interest
mailing list