[antlr-interest] Problem parsing unit symbols
Mark van Assem
mark at cs.vu.nl
Fri Nov 6 04:25:27 PST 2009
Hi,
> The Ångstrom symbol is capital-A-ring (\u00C5 or \u212B), by the way.
Correct, I wasn't precise. I have to parse text obtained from Excel
files made by people, who will probably use many variants of writing the
same thing down, e.g. "Å" "Ångstrom" "ångström" etcetera.
> The grammar includes alpha, not the Ångstrom symbol, so that explains
> the interpreter behaviour. The behaviour when run on a file is likely
My bad, I stripped the wrong part of my original file. Thanks for
spotting this.
> Also, either make sure that the file does not contain an initial BOM
> (Byte Order Mark, \uFFEF), or match that character in your grammar.
How can I see that such a thing is present in a file? Is there an editor
or viewer or something like that that can assist me in this?
Many thanks,
Mark.
More information about the antlr-interest
mailing list