[antlr-interest] Development of an XQuery parser with full-text extensions, project report

Tue Dec 25 17:31:11 PST 2007

On Dec 25, 2007, at 1:25 PM, Johannes Luber wrote:

> Terence Parr schrieb:
>>
>> On Dec 25, 2007, at 10:23 AM, Johannes Luber wrote:
>>> FYI, in ANTLR 3.1
>>> all automatic recovery has been removed - at least it should happen
>>> there.
>>
>> Actually, it will be there no problem.  Rule-level recovery will be
>> default but within-rule recovery should be turned off.  Can turn it  
>> back
>> on with a simple method override.
>
> Can you give an example to explain the difference between both
> situations and why the differentiation makes sense?

The new situation will be that any syntax error, no viable alternative  
for mismatched token, will throw an exception which is caught at the  
bottom of the rule.  The catch clause will report an error and  
initiate error recovery.

  As an option, and the default for pre 3.1, you can have antlr try to  
recover within the rule. This is where it does its single token  
insertion or deletion.  If you forget a ')', for example, it often can  
continue within the rule.  This often results in much better error  
recovery.  Per my previous e-mails, this can cause trouble for actions  
that expected that previous token to be matched.  Imagine an action  
that references $ID but the ID did not actually get matched! The  
parser inserted one magically, but it has useless data.

>>> A glance into Lexer.java tells me that nextToken() still has the
>>> same unfortunate behaviour with no added throws-clause. Maybe Ter  
>>> didn't
>>> get to it yet.
>>>
>>
>> remind me which issue we're talking about again?
>
> nextToken() catches RuleMismatchException (could be another kindof
> exception), so it doesn't declare that it throws  
> RuleMismatchException.
> But there are instances, where someone wants overwrite nextToken()  
> with
> rethrowing the exception - and can't.

Hmm...well, I just looked and it seems like it will cause a ripple  
effect where you have to put the catch clause everywhere. It should be  
okay as most of the references are within a parser rule, but don't you  
think it is simply easier to throw either an error or run-time  
exception if you want the entire parsing process to stop upon lexical  
error.

Throwing a recognition exception from a token stream is undefined. Who  
is supposed to catch it? you might not even use a parser? I'm not sure  
recognition exceptions should come out.  When there is a problem it  
should keep going.  If you wanted to bail out and stop lexing, you  
must make the invoking parser fail as well. I don't think we should  
pass a lexical recognition exception on to the parser because it is  
not an exception for the parser.

   Does that make sense?

Ter