[antlr-interest] Better recovery from a mismatched token desired

Thu Jul 23 09:38:04 PDT 2009

On Jul 22, 2009, at 2:38 PM, Stanislav Sokorac <sokorac at gmail.com>  
wrote:

> Thanks, Jim. I looked at the JavaFX grammar, and I like how they did  
> that. It's a neat way to achieve what I'm trying to do.
>

Whoever did that was extremely handsome and clever.

> I'm not sure I understand option A, though. Since 'end' needs to be  
> at the end of the program, it will almost always be in the follow  
> set after a semi-colon. What kind of jiggering could I do to avoid  
> that?

Maybe none. I would have to see your grammar. However the END should  
be in the top level rule and compounds lower down. Or you could  
manually create the resync method.

Jim

>
> Stan
>
> On Fri, Jul 17, 2009 at 11:36 PM, Jim Idle <jimi at temporal-wave.com>  
> wrote:
> You need to either:
>
> A) carefully rejiggering your grammar so that the follow set does  
> not end up being END
>
> B) create an empty rule with an @init that consumes to the followset  
> and so resyncs tobwhat you want because the followset of the empty  
> rulemis the first set of your loop element. Look at the JavaFX  
> compiler for an example that is easy enough to follow.
>
> Jim
>
>
> On Jul 17, 2009, at 11:28 AM, Stanislav Sokorac <sokorac at gmail.com>  
> wrote:
>
> I have a simple grammar (pasted below) for a language that allows  
> two types of statements: let or int, where int declares a variable,  
> and let assigns to it. A "program" is a collection of these  
> statements wrapped in begin/end tokens.
>
> My problems is that when ANTLR encounters a token other than 'int'  
> or 'let' as the first word of the statement, it pops out of its  
> statement loop and reports a token mismatch -- "mismatch input  
> 'something' expecting 'end'", and then proceeds to consume all  
> tokens until 'end'. All statements subsequent to the mismatched one  
> are not parsed, and I would like to have the rest of the file  
> analyzed.
>
> It seems that even if I override the recovery method and consume up  
> to a semicolon, it'll try to match up the next token with 'end' and  
> fail again, as it's no longer even looking for statements. How do I  
> keep the parser inside the statement loop when it detects a  
> mismatched token?
>
> Here's my sample input:
>
> begin
> int a;
> let a=3;
> double c;
> let c =4;
> end
>
> (I'd like to see let c=4; parsed, even though 'double c;' is an  
> invalid statement)
>
> Here's my simple grammar:
>
> grammar test;
>
> program    :    'begin' statement* 'end' EOF;
> statement    :    'let' ID '=' NUMBER ';' | 'int' ID ';';
>
> NUMBER    :    ('0'..'9')+;
> ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
> WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090723/11e94bb6/attachment.html