[antlr-interest] Does ANTLR exactly allow Unicode?

Tommy Nordgren tommy.nordgren at chello.se
Sun Oct 22 04:48:10 PDT 2006


On 22 okt 2006, at 11.59, 新买 wrote:

> I had created a simple grammar to study ANTLR. and use Chinese  
> charater as letter, and ANTLR throws no warning or error.
> However, when I input a piece of demo stream,like below:
>
> 开始
> 输出 "开始开始";
> 结束
>
> it report some aweful error.
> line 1:1: unexpected char: 0xBF
>  at LearnLexer.nextToken(LearnLexer.java:102)
>  at antlr.TokenBuffer.fill(TokenBuffer.java:69)
>  at antlr.TokenBuffer.LT(TokenBuffer.java:86)
>  at antlr.LLkParser.LT(LLkParser.java :56)
>  at LearnParser.multiWriteStatement(LearnParser.java:89)
>  at Test.main(Test.java:18)
>
> Trace the lexer, I found an interesting thing. the char "开" is  
> "\u5f00", but it report with 0xBF.
> Somebody tell me how use Unicode by ANTLR exactly,  thanks a lot.
	You need to set up your input (character) stream to use the correct  
encoding when
converting from it's input (byte) stream


More information about the antlr-interest mailing list