[antlr-interest] Exception on obscure char but then continuelexing?

Martin Olsson mnemo at minimum.se
Mon Aug 1 08:09:38 PDT 2005

Thanks Alexey. I've been tampering with this lexer error handling for 8h
now, but now finally it works for the "¤" char atleast!

Now, is it possible to generalize this rule into something that traps more
or less everything that the lexer doesn't understand? I mean if someone
enters some other obscure char like £ into the source I want that
highlighted as a syntax error too (still without the lexer/parser stopping
its operation). I could of course add another rule just like the one
below, but there is too many chars that is not used in my language so I
would rather have a general rule if possible.

I also have another question on the same theme. My language does
assignments using the ::= operator and comparisions using the == operator.
Lot's of users will try to assign variables using the = operator instead.
Currently when someone tries to use "x=blah" my lexer believes that the =
is a partial == and thus it says "found 'b' but expected '='". This is all
very well, but at this stage the lexer also aborts its operations. All the
stuff below the faulty assignment does not get parsed. In this case I
would like something similar to what you taught me above, ie I dont want
the lexer to stop; I just want it to mark the = as a "bad token syntax
error" and then keep on going. Is this doable too maybe?


> You can define lexer rule that accepts any wrong chars.
> The action for this rule should report error and skip
> wrong token. It is very similar to comment handling
> except for error reporting.
> in lexer:
> protected ERROR: "¤"
> {
>     // report error
>     $setType(Token.SKIP);
> }
> So, in case of "func¤tion" parser will receive
> two ID tokens: "func" and "tion".
> Regards,
> Alexey
> -----
> Alexey Demakov
> TreeDL: Tree Description Language: http://treedl.sourceforge.net
> RedVerst Group: http://www.unitesk.com
> ----- Original Message -----
> From: "Martin Olsson" <mnemo at minimum.se>
> To: <antlr-interest at antlr.org>
> Sent: Monday, August 01, 2005 1:33 PM
> Subject: [antlr-interest] Exception on obscure char but then continue
> lexing?
>> Hi,
>> I'm using an ANTLR parser to detect syntax errors in a c-like language.
>> Currently if the lexer runs into for instance the char "¤" (which is not
>> a
>> part of my language) it will throw a NoViableAltForCharException which
>> is
>> then wrapped into a TokenStreamRecognitionException. I catch this and
>> display the syntax error in my editor.
>> The problem is that ANTLR seems to stop parsing too at this time.
>> Instead
>> I would like it to throw an exception as above, but then just ignore
>> that
>> character and resume lexing more or less as if the erroneous char never
>> appeared (it should also, if possible, start over with flushed buffer so
>> that the chars "func¤tion" will not be interpretted as a valid
>> "function"
>> token.
>> Is this possible with ANTLR 2.7.5 ? Are there any examples of this?
>> Sincerly,
>> Martin

More information about the antlr-interest mailing list