[antlr-interest] Better recovery from a mismatched token desired

Stanislav Sokorac sokorac at gmail.com
Wed Jul 22 17:38:31 PDT 2009


Thanks, Jim. I looked at the JavaFX grammar, and I like how they did that.
It's a neat way to achieve what I'm trying to do.

I'm not sure I understand option A, though. Since 'end' needs to be at the
end of the program, it will almost always be in the follow set after a
semi-colon. What kind of jiggering could I do to avoid that?

Stan

On Fri, Jul 17, 2009 at 11:36 PM, Jim Idle <jimi at temporal-wave.com> wrote:

> You need to either:
>
> A) carefully rejiggering your grammar so that the follow set does not end
> up being END
>
> B) create an empty rule with an @init that consumes to the followset and so
> resyncs tobwhat you want because the followset of the empty rulemis the
> first set of your loop element. Look at the JavaFX compiler for an example
> that is easy enough to follow.
>
> Jim
>
>
> On Jul 17, 2009, at 11:28 AM, Stanislav Sokorac <sokorac at gmail.com> wrote:
>
>  I have a simple grammar (pasted below) for a language that allows two
>> types of statements: let or int, where int declares a variable, and let
>> assigns to it. A "program" is a collection of these statements wrapped in
>> begin/end tokens.
>>
>> My problems is that when ANTLR encounters a token other than 'int' or
>> 'let' as the first word of the statement, it pops out of its statement loop
>> and reports a token mismatch -- "mismatch input 'something' expecting
>> 'end'", and then proceeds to consume all tokens until 'end'. All statements
>> subsequent to the mismatched one are not parsed, and I would like to have
>> the rest of the file analyzed.
>>
>> It seems that even if I override the recovery method and consume up to a
>> semicolon, it'll try to match up the next token with 'end' and fail again,
>> as it's no longer even looking for statements. How do I keep the parser
>> inside the statement loop when it detects a mismatched token?
>>
>> Here's my sample input:
>>
>> begin
>> int a;
>> let a=3;
>> double c;
>> let c =4;
>> end
>>
>> (I'd like to see let c=4; parsed, even though 'double c;' is an invalid
>> statement)
>>
>> Here's my simple grammar:
>>
>> grammar test;
>>
>> program    :    'begin' statement* 'end' EOF;
>> statement    :    'let' ID '=' NUMBER ';' | 'int' ID ';';
>>
>> NUMBER    :    ('0'..'9')+;
>> ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
>> WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090722/36a16280/attachment.html 


More information about the antlr-interest mailing list