[antlr-interest] Does ANTLR exactly allow Unicode?
新买
inshua at gmail.com
Sun Oct 22 02:59:59 PDT 2006
I had created a simple grammar to study ANTLR. and use Chinese charater as
letter, and ANTLR throws no warning or error.
However, when I input a piece of demo stream,like below:
开始
输出 "开始开始";
结束
it report some aweful error.
line 1:1: unexpected char: 0xBF
at LearnLexer.nextToken(LearnLexer.java:102)
at antlr.TokenBuffer.fill(TokenBuffer.java:69)
at antlr.TokenBuffer.LT(TokenBuffer.java:86)
at antlr.LLkParser.LT(LLkParser.java:56)
at LearnParser.multiWriteStatement(LearnParser.java:89)
at Test.main(Test.java:18)
Trace the lexer, I found an interesting thing. the char "开" is "\u5f00", but
it report with 0xBF.
Somebody tell me how use Unicode by ANTLR exactly, thanks a lot.
header{
import java.util.*;
}
class LearnLexer extends Lexer;
options{
charVocabulary = '\u0003' .. '\uFFFE';
caseSensitive = false;
k = 2;
}
String :
'\"' (~'\"')* '\"'
;
YINHAO :
'\"';
WS : (' '
| '\t'
| '\n'
| '\r')
{ _ttype = Token.SKIP; }
;
WRITE:
"\u8f93\u51fa"
;
Fenhao : ';'
;
BEGIN : "\u5f00\u59cb"
;
END : "\u5b8c\u6bd5"
;
class LearnParser extends Parser;
options{
buildAST = true;
}
writeStatement :
WRITE^ String Fenhao!;
multiWriteStatement :
BEGIN^ (writeStatement)* END!
;
class LearnTreeWalker extends TreeParser;
multiWriteStatement{
int i;
}
: #(a:BEGIN .) {
for(AST t = a.getFirstChild(); t != null; t = t.getNextSibling()){
writeStatement(t);
}
}
;
writeStatement{
String s;
}
: #(WRITE s=string) {System.out.print(s);}
;
string returns[String r]{
r = null;
}
: s : String {r = s.getText();}
;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061022/851bcf61/attachment.html
More information about the antlr-interest
mailing list