[antlr-interest] Development of an XQuery parser with full-text extensions, project report
Terence Parr
parrt at cs.usfca.edu
Tue Dec 25 17:31:11 PST 2007
On Dec 25, 2007, at 1:25 PM, Johannes Luber wrote:
> Terence Parr schrieb:
>>
>> On Dec 25, 2007, at 10:23 AM, Johannes Luber wrote:
>>> FYI, in ANTLR 3.1
>>> all automatic recovery has been removed - at least it should happen
>>> there.
>>
>> Actually, it will be there no problem. Rule-level recovery will be
>> default but within-rule recovery should be turned off. Can turn it
>> back
>> on with a simple method override.
>
> Can you give an example to explain the difference between both
> situations and why the differentiation makes sense?
The new situation will be that any syntax error, no viable alternative
for mismatched token, will throw an exception which is caught at the
bottom of the rule. The catch clause will report an error and
initiate error recovery.
As an option, and the default for pre 3.1, you can have antlr try to
recover within the rule. This is where it does its single token
insertion or deletion. If you forget a ')', for example, it often can
continue within the rule. This often results in much better error
recovery. Per my previous e-mails, this can cause trouble for actions
that expected that previous token to be matched. Imagine an action
that references $ID but the ID did not actually get matched! The
parser inserted one magically, but it has useless data.
>>> A glance into Lexer.java tells me that nextToken() still has the
>>> same unfortunate behaviour with no added throws-clause. Maybe Ter
>>> didn't
>>> get to it yet.
>>>
>>
>> remind me which issue we're talking about again?
>
> nextToken() catches RuleMismatchException (could be another kindof
> exception), so it doesn't declare that it throws
> RuleMismatchException.
> But there are instances, where someone wants overwrite nextToken()
> with
> rethrowing the exception - and can't.
Hmm...well, I just looked and it seems like it will cause a ripple
effect where you have to put the catch clause everywhere. It should be
okay as most of the references are within a parser rule, but don't you
think it is simply easier to throw either an error or run-time
exception if you want the entire parsing process to stop upon lexical
error.
Throwing a recognition exception from a token stream is undefined. Who
is supposed to catch it? you might not even use a parser? I'm not sure
recognition exceptions should come out. When there is a problem it
should keep going. If you wanted to bail out and stop lexing, you
must make the invoking parser fail as well. I don't think we should
pass a lexical recognition exception on to the parser because it is
not an exception for the parser.
Does that make sense?
Ter
More information about the antlr-interest
mailing list