[antlr-interest] Unexpected char when generating lexer
Gregg Reynolds
dev at arabink.com
Tue Dec 20 13:54:46 PST 2005
Thiago Arrais wrote:
> 2005/12/20, Gregg Reynolds <dev at arabink.com>:
>
>>Try using single quotes: '\u0000'. That works for me.
>
>
> Weird. The following grammar (with single quotes) gives me the same error.
>
Exactly the same message? That seems odd indeed. If I understand Java
correctly,
Sorry, I read and replied in haste. It looks like maybe a bug. I dug
around in the source a bit; it may be in mTEXT_ARG in ActionLexer.java,
which (I think) gets control after "$setText(" is recognized and looks
chars up in BitSets that I don't understand; that's as far as I could
get. The message seems to come from NoViableAltForCharException.java.
(This is antlr-2.7.5)
As a work around I would experiment by e.g. putting a space in front of
\u0000, using a variable, etc. Or, do you have an editor that will
allow you to insert the hex value directly?
Also try something like \u0065 or the like to see if "ordinary" chars
work. Maybe find the lowest \u00xx that will work. There's a
hard-coded test for LA(2) >= '\u0003' && LA(2) <= '\u00ff'. Dunno why.
FYI: I'm primarily interested in using unicode with Antlr; does anybody
know of any documentation on the semantics of the BitSets defined in
ActionLexer? Presumably they represent character classes of some sort?
Another general question: does Antlr read in a grammar file as a byte
stream or a character stream?
Hope it helps,
gregg
More information about the antlr-interest
mailing list