[antlr-interest] Does Java.g [version 1.0.6] handle unicode characters?

Roberto Mannai robermann at gmail.com
Fri Aug 27 01:08:16 PDT 2010


Hello

I'm trying to understand whether the Java grammar from
http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g
processes correctly the Unicode chars or not.

In the file's header I read:
<<
 *  Know problems:
 *    Won't pass input containing unicode sequence like this
 *      char c = '\uffff'
 *      String s = "\uffff";
 *    Because Antlr does not treat '\uffff' as an valid char. This
will be fixed in the next Antlr
 *    release. [Fixed in Antlr-3.1.1]
>>

So, it seems that antlr 3.2 should handle the Unicode charset. Anyway,
when I try to parse the following class:

public class TestUnicode {
        public static void test (String[] args){
                char c = '\uffff';
        }
}

I get the following error:
      line 3:27 no viable alternative at character 'u'
      line 3:34 mismatched character '\r' expecting '''
      line 1:7 mismatched input 'class' expecting MONKEYS_AT
      line 2:22 mismatched input 'void' expecting MONKEYS_AT
      line 3:21 mismatched input 'c' expecting DOT
      line 3:23 no viable alternative at input '='
      line 4:8 no viable alternative at input '}'
      line 4:8 no viable alternative at input '}'

If I replace the unicode character it of course works. Am I missing
anything? Please note that version 1.0.5 didn't have this problem.

Thanks for your help.

Roberto


More information about the antlr-interest mailing list