[antlr-interest] Problem parsing unit symbols
David-Sarah Hopwood
david-sarah at jacaranda.org
Thu Nov 5 11:18:40 PST 2009
Mark van Assem wrote:
> Hello Antlers,
>
> I'm designing a lexer/parser for units of measure (e.g. meters,
> seconds). In that process I'm trying to match symbols like Ω (Ohm) and å
> (angstrom).
The Ångstrom symbol is capital-A-ring (\u00C5 or \u212B), by the way.
> Below is the relevant part of the grammar - the part that treats
> symbols. The grammar checks out OK in ANTLRWorks, but I get a
> EarlyExitException when I run it on a file that contains two lines with
> on the first the Ohm sign and on the second the angstrom sign. The
> behaviour is different in the interpreter: there the first line is
> parsed OK, but for the second line a NoViableAltException is given.
The grammar includes alpha, not the Ångstrom symbol, so that explains
the interpreter behaviour. The behaviour when run on a file is likely
to be a character encoding issue; make sure that the charset parameter
to ANTLRInputStream matches the encoding of your file (probably UTF-8).
Also, either make sure that the file does not contain an initial BOM
(Byte Order Mark, \uFFEF), or match that character in your grammar.
--
David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20091105/7e6d3c63/attachment.bin
More information about the antlr-interest
mailing list