[antlr-interest] About literal supports unicode

Sat Jun 13 09:45:22 PDT 2009

Ha Luong wrote:
> Dear all,
>
> I tried to use the grammar for accepting the unicode string as follow:
> //modify T.g in the example source of ANTLR book
> grammar T;
> options {
>     language=Java;
> }
> @members {
> String s;
> }
> r : ID '#' {s = $ID.text; System.out.println("found "+s);} ;
> ID: ('a'..'z'|'\u00e0')+ ; //\u00e0
> WS: (' '|'\n'|'\r')+ {skip();} ; // ignore whitespace
>
> and do these commands in cygwin:
> java org.antlr.Tool T.g
> javac *.java
>
> If I test the literal 'a', it is ok
> java Test
> a #
> ^Z
> found a
>
> but the literal 'à', it has error:
How are you opening your input stream? Have you opened it using the 
encoding that the input file has been saved in, such as UTF-32, UTF-8, 
something else?


Jim