[antlr-interest] Better recovery from a mismatched token desired

Fri Jul 17 14:28:06 PDT 2009

I have a simple grammar (pasted below) for a language that allows two types
of statements: let or int, where int declares a variable, and let assigns to
it. A "program" is a collection of these statements wrapped in begin/end
tokens.

My problems is that when ANTLR encounters a token other than 'int' or 'let'
as the first word of the statement, it pops out of its statement loop and
reports a token mismatch -- "mismatch input 'something' expecting 'end'",
and then proceeds to consume all tokens until 'end'. All statements
subsequent to the mismatched one are not parsed, and I would like to have
the rest of the file analyzed.

It seems that even if I override the recovery method and consume up to a
semicolon, it'll try to match up the next token with 'end' and fail again,
as it's no longer even looking for statements. How do I keep the parser
inside the statement loop when it detects a mismatched token?

Here's my sample input:

begin
int a;
let a=3;
double c;
let c =4;
end

(I'd like to see let c=4; parsed, even though 'double c;' is an invalid
statement)

Here's my simple grammar:

grammar test;

program    :    'begin' statement* 'end' EOF;
statement    :    'let' ID '=' NUMBER ';' | 'int' ID ';';

NUMBER    :    ('0'..'9')+;
ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
WS  :   ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090717/caad1e17/attachment.html