[antlr-interest] Problem parsing unit symbols

Mark van Assem mark at cs.vu.nl
Fri Nov 6 04:25:27 PST 2009


Hi,

> The Ångstrom symbol is capital-A-ring (\u00C5 or \u212B), by the way.

Correct, I wasn't precise. I have to parse text obtained from Excel 
files made by people, who will probably use many variants of writing the 
same thing down, e.g. "Å" "Ångstrom" "ångström" etcetera.

> The grammar includes alpha, not the Ångstrom symbol, so that explains
> the interpreter behaviour. The behaviour when run on a file is likely

My bad, I stripped the wrong part of my original file. Thanks for 
spotting this.

> Also, either make sure that the file does not contain an initial BOM
> (Byte Order Mark, \uFFEF), or match that character in your grammar.

How can I see that such a thing is present in a file? Is there an editor 
or viewer or something like that that can assist me in this?

Many thanks,
Mark.


More information about the antlr-interest mailing list