[antlr-interest] Lexer recognition exceptions

Wed Jan 28 08:51:18 PST 2009

Bruno Marc-Aurele wrote:
> Hi,
>
> I am currently doing a project where the description of our software
> architecture is very important. Therefore, I have to understand the code that's
> generated by ANTLR properly (academic stuff... Need to provide documents and
> follow them...)
>
> I have seen that the generated parser catches the RecognitionExpections during
> parsing. How about the lexer? There is no catch block, so I understand that the
> exceptions are to be catched by us when we instantiate the lexer object. Am I
> right or is there some tricky mechanics involved in the base classes that I am
> not aware of?
>   
There is very little that a lexer can do when it encounters something 
that it does not like. In fact all it can really do is print a message 
and consume the character it is looking at. Look at the nextToken() 
method in Lexer.java and the mTokens() code. So, you can override the 
error reporting if you like.

However really, you are supposed to code your lexer to handle errors. At 
the simplest case, you create a catch all rule to pick out characters 
that you have not otherwise created a match for:

BAD : . { your error handler } ;

However, you also need to cater for things that can happen once the 
prediction has indicated a particular rule and something goes wrong 
while executing the match. For instance, unterminated literal strings 
and so on. Code as much of this as is useful/practical, then override 
the error reporting mechanism to give you some error that will make 
sense to your users that is more than "Unrecognized character". It could 
also do custom recovery I suppose too.

Also see:

http://antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs 

For an example of coding for errors.

Jim