[antlr-interest] Creating a lexer that returns a token for bad characters

Sun Apr 27 11:51:50 PDT 2008

Hi,

I'm trying to create a lexer that will return a token for invalid
characters.  For example, if you have a this:

INT : 'int';

WHITESPACE : (' ')+;

And the input is 'int   iint'.  I would want a token stream of INT('int'),
WHITESPACE('   ') and BAD('iint').  I just got the ANTLR book, but is such a
thing possible?  It looks like I would have to create a new nextToken()
method that tracks the start of the bad character, keeps consuming until it
hits a valid token.  I would then rollback that valid token and create a bad
token for part recorded.  Is there a better way to do this?  Any help would
be appreciated, thanks.

-- bryan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080427/9f2cdacf/attachment.html