[antlr-interest] Lexer question

Mon Dec 12 12:29:37 PST 2005

Hi Guys,

Hopefully someone can help me out.  I would like my lexer to create a
special INVALID_CHARACTER token for any invalid characters it finds and
send them along to the parser so that it can be handled in the parser.

I have my char vocabulary set to '\0'..'\377' and have the filter option
set to a INVALID_CHARACTER.  This way all invalid characters ( such as
Unicode characters ) are matched by the filter rule.

Doing this I can report the character as an error but I really do need
that character to still be a part of the token stream ( rather then
ignored ).  So far the only way I've thought of to be able to accomplish
this is to make all my rules protected and have one giant rule that
matches all my subrules, this would be a major pain.  It's either that
or hack into the lexer generator and make the filter rule create and
return a token.

Any one have any better ideas?

Thanks,
Ari

Ari Steinberg
Engineer Extraordinaire @
Embarcadero Technologies
416-593-1585 x231
ari.steinberg at embarcadero.com