[antlr-interest] accentuated chars in lexer makes gcj failed

Mathieu Clabaut mathieu.clabaut at gmail.com
Mon Feb 7 07:44:25 PST 2005


Hello,

the following Lexer grammar :
CHAR  : ('a'..'z'|'A'..'Z'|'_'| '-'
         | 'é' | 'è' | 'ê' | 'ë'
         | 'á' | 'à' | 'â' | 'ä'
         | 'ú' | 'ù' | 'û' | 'ü'          | 'î' | 'ï'
         | 'ô' | 'ö' );

Get translated in the following pieces of code
               case '-':
               {
                        match('-');
                        break;
               }
               case '\u00e9':
               {
                        match('é');
                        break;
               }

 It works well when compiling in java bytecode (javac), But when using
gcj, gcj complains about the 'é' accentuated char :
   GraphesLexer.java:671: erreur: unrecognized character in input
stream.

 If I replace 'é' by '\u00e9', it works like a charm.
 Are their  any reason why 'é' is used instead of 'u00e9' ?
 (It is perhaps a bug of gcj, but the difference between the case
parameter and the match() paramter looks strange to the newbie I am)...

 -mat 
-- 
________________http://www.gnu.org/philosophy/no-word-attachments.fr.html
Mathieu CLABAUT                            mailto:mathieu.clabaut at free.fr
           F2F5 442F F2AC E1D5 9D31  3EFC 842A BC4A 123B 9A65


More information about the antlr-interest mailing list