[antlr-interest] Creating a lexer that returns a token for bad characters
Bryan H. Haber
bryan.haber at gmail.com
Sun Apr 27 13:24:50 PDT 2008
Ah, I hadn't thought of that. Since I do need to recognize identifiers,
'iint' isn't actually a bad token, it's just not a keyword. Thanks Gavin,
I'll try this out.
-----Original Message-----
From: Gavin Lambert [mailto:antlr at mirality.co.nz]
Sent: Sunday, April 27, 2008 12:54 PM
To: Bryan H. Haber; antlr-interest at antlr.org
Subject: Re: [antlr-interest] Creating a lexer that returns a token for bad
characters
At 06:51 28/04/2008, Bryan H. Haber wrote:
>INT : 'int';
>WHITESPACE : (' ')+;
>
>And the input is 'int iint'. I would want a token stream of
>INT('int'), WHITESPACE(' ') and BAD('iint'). I just got the
>ANTLR book, but is such a thing possible? It looks like I would
>have to create a new nextToken() method that tracks the start of
>the bad character, keeps consuming until it hits a valid
>token. I would then rollback that valid token and create a bad
>token for part recorded. Is there a better way to do this? Any
>help would be appreciated, thanks.
Try adding this as the last lexer rule:
BAD: .+;
Though I *think* this won't do exactly what you want since it
won't use whitespace as a delimiter; you should end up with
INT('int'), WHITESPACE(' '), BAD('i'), INT('int'). I think.
Another option is just to add an ID rule for identifiers; then
'iint' will match as an identifier and you can decide whether it's
good or bad when it reaches the parser. (This one will be
whitespace delimited.)
More information about the antlr-interest
mailing list