[antlr-interest] Creating a lexer that returns a token for bad characters

Bryan H. Haber bryan.haber at gmail.com
Sun Apr 27 11:51:50 PDT 2008


Hi,

 

I'm trying to create a lexer that will return a token for invalid
characters.  For example, if you have a this:

 

INT : 'int';

WHITESPACE : (' ')+;

 

And the input is 'int   iint'.  I would want a token stream of INT('int'),
WHITESPACE('   ') and BAD('iint').  I just got the ANTLR book, but is such a
thing possible?  It looks like I would have to create a new nextToken()
method that tracks the start of the bad character, keeps consuming until it
hits a valid token.  I would then rollback that valid token and create a bad
token for part recorded.  Is there a better way to do this?  Any help would
be appreciated, thanks.

 

-- bryan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080427/9f2cdacf/attachment.html 


More information about the antlr-interest mailing list