[antlr-interest] Unicode RecognitionException
rashmi shenoy
rashmi.shenoy at gmail.com
Sun Jul 23 23:42:19 PDT 2006
Hi,
I have a situation in my program where I need to read strings from a file
and recognise them as words or unicode chars.
The input is read from a UTF-8 file char by char. When the character read is
a non-english char say Japanese, then the following exception is thrown. I
have done the following in my antlr.g file:
1. In Lexer-options, charVocabulary='\u0080'..'\ufffe';
2. TOKEN_UNICODE : ('\u0080'..'\ufffe')+ ;
3. In the rules section of parser, I have the following :
rule :
TOKEN_words
{
print("words");
}
| TOKEN_UNICODE
{
print("Unicode");
}
But when I run my program, *antlr.NoViableException unexpected token:
\u0082* is thrown.
How do I fix this ?
Regards
Rash
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060724/bb7bfda5/attachment.html
More information about the antlr-interest
mailing list