[antlr-interest] RES: Accentuated chars in brazilian portuguese
Nilo Roberto C Paim
nilopaim at gmail.com
Sat Jun 4 08:19:00 PDT 2011
Thanks to Bart and Douglas for the hints.
Ive discovered a lot of things in this process, including the encoding of
my input file.
Im on the way now
De: Douglas Godfrey [mailto:douglasgodfrey at gmail.com]
Enviada em: quinta-feira, 2 de junho de 2011 04:48
Para: Nilo Roberto C Paim
Cc: antlr-interest at antlr.org
Assunto: Re: [antlr-interest] Accentuated chars in brazilian portuguese
Lookup the latin Unicode code pages on Wikipedia and add the Unicode code
accented Latin1 to your rule WORD.
Latin1_Supplement : '\u00A0' .. '\u00FF';
Latin_ExtendedA : '\u0100' .. '\u017F';
Latin_ExtendedB : '\u0180' .. '\u024F';
On Wed, Jun 1, 2011 at 4:53 PM, Nilo Roberto C Paim <nilopaim at gmail.com>
I'm newbie using Antlr and I'm facing a problem when trying to parse a text
that contains accentuated chars in Brazilian Portuguese.
I've put a word definition on my grammar as follows:
WORD : ( '\u00c0'..'\u00ff' | 'a'..'z' |
'A'..'Z' | '-' )+ ;
But have no success on parsing. Words like "não" ("no" in Portuguese) causes
lexar throws "Antlr.Runtime.NoViableAltException".
I'm trying to use C#.
Nilo, from Brasil...
More information about the antlr-interest