[antlr-interest] Unexpected char when generating lexer

Gregg Reynolds dev at arabink.com
Tue Dec 20 13:54:46 PST 2005


Thiago Arrais wrote:
> 2005/12/20, Gregg Reynolds <dev at arabink.com>:
> 
>>Try using single quotes:  '\u0000'.  That works for me.
> 
> 
> Weird. The following grammar (with single quotes) gives me the same error.
> 
Exactly the same message?  That seems odd indeed.  If I understand Java 
correctly,

Sorry, I read and replied in haste.  It looks like maybe a bug.  I dug 
around in the source a bit; it may be in mTEXT_ARG in ActionLexer.java, 
which (I think) gets control after "$setText(" is recognized and looks 
chars up in BitSets that I don't understand; that's as far as I could 
get.  The message seems to come from NoViableAltForCharException.java. 
(This is antlr-2.7.5)

As a work around I would experiment by e.g. putting a space in front of 
\u0000, using a variable, etc.  Or, do you have an editor that will 
allow you to insert the hex value directly?

Also try something like \u0065 or the like to see if "ordinary" chars 
work.  Maybe find the lowest \u00xx that will work.  There's a 
hard-coded test for LA(2) >= '\u0003' && LA(2) <= '\u00ff'.  Dunno why.

FYI: I'm primarily interested in using unicode with Antlr; does anybody 
know of any documentation on the semantics of the BitSets defined in 
ActionLexer?  Presumably they represent character classes of some sort?

Another general question:  does Antlr read in a grammar file as a byte 
stream or a character stream?

Hope it helps,

gregg


More information about the antlr-interest mailing list