[antlr-interest] RES: Accentuated chars in brazilian portuguese

Nilo Roberto C Paim nilopaim at gmail.com
Sat Jun 4 08:19:00 PDT 2011


Thanks to Bart and Douglas for the hints.

 

I’ve discovered a lot of things in this process, including the encoding of
my input file.

 

I’m on the way now


 

Thanks all.

 

 

De: Douglas Godfrey [mailto:douglasgodfrey at gmail.com] 
Enviada em: quinta-feira, 2 de junho de 2011 04:48
Para: Nilo Roberto C Paim
Cc: antlr-interest at antlr.org
Assunto: Re: [antlr-interest] Accentuated chars in brazilian portuguese

 

Lookup the latin Unicode code pages on Wikipedia and add the Unicode code
points for 
accented Latin1 to your rule WORD.

fragmen
Latin1_Supplement                   :   '\u00A0' .. '\u00FF';
fragment
Latin_ExtendedA                     :   '\u0100' .. '\u017F';
fragment
Latin_ExtendedB                     :   '\u0180' .. '\u024F';



On Wed, Jun 1, 2011 at 4:53 PM, Nilo Roberto C Paim <nilopaim at gmail.com>
wrote:

Hi all,

I'm newbie using Antlr and I'm facing a problem when trying to parse a text
that contains accentuated chars in Brazilian Portuguese.

I've put a word definition on my grammar as follows:

               WORD :                  ( '\u00c0'..'\u00ff' | 'a'..'z' |
'A'..'Z' | '-' )+ ;

But have no success on parsing. Words like "não" ("no" in Portuguese) causes
lexar throws "Antlr.Runtime.NoViableAltException".

I'm trying to use C#.

Any hint?

TIA

Nilo, from Brasil...


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

 



More information about the antlr-interest mailing list