[antlr-interest] Unicode RecognitionException

rashmi shenoy rashmi.shenoy at gmail.com
Sun Jul 23 23:42:19 PDT 2006


Hi,

I have a situation in my program where I need to read strings from a file
and recognise them as words or unicode chars.
The input is read from a UTF-8 file char by char. When the character read is
a non-english char say Japanese, then the following exception is thrown. I
have done the following in my antlr.g file:

1. In Lexer-options, charVocabulary='\u0080'..'\ufffe';
2. TOKEN_UNICODE : ('\u0080'..'\ufffe')+   ;
3. In the rules section of parser, I have  the following :

 rule :
     TOKEN_words
       {
            print("words");
       }

     |  TOKEN_UNICODE
       {
           print("Unicode");
       }

But when I run my program,  *antlr.NoViableException  unexpected token:
\u0082* is thrown.
How do I fix this ?

Regards
Rash
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060724/bb7bfda5/attachment.html


More information about the antlr-interest mailing list