[antlr-interest] RES: Accentuated chars in brazilian portuguese

Nilo Roberto C Paim nilopaim at gmail.com
Sat Jun 4 08:19:00 PDT 2011

Thanks to Bart and Douglas for the hints.


I’ve discovered a lot of things in this process, including the encoding of
my input file.


I’m on the way now


Thanks all.



De: Douglas Godfrey [mailto:douglasgodfrey at gmail.com] 
Enviada em: quinta-feira, 2 de junho de 2011 04:48
Para: Nilo Roberto C Paim
Cc: antlr-interest at antlr.org
Assunto: Re: [antlr-interest] Accentuated chars in brazilian portuguese


Lookup the latin Unicode code pages on Wikipedia and add the Unicode code
points for 
accented Latin1 to your rule WORD.

Latin1_Supplement                   :   '\u00A0' .. '\u00FF';
Latin_ExtendedA                     :   '\u0100' .. '\u017F';
Latin_ExtendedB                     :   '\u0180' .. '\u024F';

On Wed, Jun 1, 2011 at 4:53 PM, Nilo Roberto C Paim <nilopaim at gmail.com>

Hi all,

I'm newbie using Antlr and I'm facing a problem when trying to parse a text
that contains accentuated chars in Brazilian Portuguese.

I've put a word definition on my grammar as follows:

               WORD :                  ( '\u00c0'..'\u00ff' | 'a'..'z' |
'A'..'Z' | '-' )+ ;

But have no success on parsing. Words like "não" ("no" in Portuguese) causes
lexar throws "Antlr.Runtime.NoViableAltException".

I'm trying to use C#.

Any hint?


Nilo, from Brasil...

List: http://www.antlr.org/mailman/listinfo/antlr-interest


More information about the antlr-interest mailing list