[antlr-interest] Question of Repetead tokens and early termination

Wed Feb 2 12:32:11 PST 2011

Your grammar does not mention the EOF token. (more below...)
On Wed, 2011-02-02 at 16:18 -0300, Victor Giordano wrote:
> Hi there. I am having trouble with the error handling.
> I have a grammar for recoignize linear expression. And it works great!.
> The grammar for a linear expresion is the following:
> 
> tokens
> {
> 	PLUS 	= '+';
> 	MINUS 	= '-';
> 	MUL		= '*';
> 	DIV		= '/';
> }
> 
> linexpr : (MINUS|PLUS)? linterm ((PLUS|MINUS) linterm)*;
> linterm : factor? ID;
> 
> expr returns [double value]
> 	: e=term {$value = $e.value;}
> 	(	PLUS e=term {$value += $e.value;}
> 	|	MINUS e=term {$value -= $e.value;}
> 	)*;
> 
> term returns [double value]
> 	: f=factor {$value = $f.value;}
> 	(	MUL f=factor {$value *= $f.value;}
> 	|	DIV f=factor {$value /= $f.value;}
> 	)*;
> 
> factor returns [double value]
> 	: DOUBLE {$value = Double.parseDouble($DOUBLE.text);}
> 	| '(' e=expr ')'{$value = $e.value;};
> 	
> ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
> 
> DOUBLE
> 	:   ('0'..'9')+
> 	|	('0'..'9')+ '.' ('0'..'9')* EXPONENT?
>      |   '.' ('0'..'9')+ EXPONENT?
>      |   ('0'..'9')+ EXPONENT
>      ;
> 
> fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
> 
> NEWLINE:'\r'? '\n' { $channel = HIDDEN; };
> 
> WS  :   (' '|'\t'|'\n'|'\r')+ { $channel = HIDDEN; };
> 
> 
> But the problem ocurrs when, for example, i have:
> "x x x"
> 
> Then the parsers stop after processing the first "x".
> ¿How do i correctly emit an invalid syntax error?.
> I Try with the catch EarlyExitException, but it doesn't works.
> I Want, inside my java aplicacition to catch this, and show to the final 
> user.
> Something like this...
> //line is equals to the user input...
> 
>              CharStream cs = new ANTLRStringStream(line);
>              LinearExpressionLexer lexer = new LinearExpressionLexer(cs);
>              CommonTokenStream tokens = new CommonTokenStream(lexer);
>              LinearExpressionParser parser = new 
> LinearExpressionParser(tokens);
>              res = parser.linexpr (); // and here, it's suppose to fail, 
> but it isn't.
> Actually, the linexpr does returns some kind of data whose type is a 
> custom class called LinearExpresion. I omit to put the return in the 
> linearexpr parser rule to simplify things.
> 
> Hope anyone can help me.
> Greettings and thanks for advance.

Greetings!

By design ANTLR parsers stop after consuming the longest possible VALID
input sequence. I believe the rational for this is that any remaining
input will be available for some other tool to process.

If you want ANTLR to try to process the entire input, reporting and
recovering from syntax errors in the input; you must tell it to do that.

By referring to the EOF token (a special built-in token) in your
top-most rule will cause ANTLR to consume the entire input string. E.g.
the parse will not have a valid input until the EOF is seen and so will
consume all of the input sentence.

I suggest adding a top-level rule similar to:

start : linexpr EOF! ;

and then call parser.start() instead of parser.linexpr() in your driver.

(note the ! meta-character after the EOF token above will keep the EOF
out of any AST produced, but you do not seem to be building an AST so it
won't make any difference...)

Hope this helps...
   -jbb