[antlr-interest] Question of Repetead tokens and early termination

Wed Feb 2 13:50:36 PST 2011

Thanks for your anwser John. Your explanation was very clear and most 
welcome!.
Thinks seem to go better beacuse kwow the parser is throwing an 
MissingTokenExeption.
I will capture and use it for the cause!!
Grettins and thanks again!.
Víctor.

El 02/02/2011 05:32 p.m., John B. Brodie escribió:
> Your grammar does not mention the EOF token. (more below...)
> On Wed, 2011-02-02 at 16:18 -0300, Victor Giordano wrote:
>> Hi there. I am having trouble with the error handling.
>> I have a grammar for recoignize linear expression. And it works great!.
>> The grammar for a linear expresion is the following:
>>
>> tokens
>> {
>> 	PLUS 	= '+';
>> 	MINUS 	= '-';
>> 	MUL		= '*';
>> 	DIV		= '/';
>> }
>>
>> linexpr : (MINUS|PLUS)? linterm ((PLUS|MINUS) linterm)*;
>> linterm : factor? ID;
>>
>> expr returns [double value]
>> 	: e=term {$value = $e.value;}
>> 	(	PLUS e=term {$value += $e.value;}
>> 	|	MINUS e=term {$value -= $e.value;}
>> 	)*;
>>
>> term returns [double value]
>> 	: f=factor {$value = $f.value;}
>> 	(	MUL f=factor {$value *= $f.value;}
>> 	|	DIV f=factor {$value /= $f.value;}
>> 	)*;
>>
>> factor returns [double value]
>> 	: DOUBLE {$value = Double.parseDouble($DOUBLE.text);}
>> 	| '(' e=expr ')'{$value = $e.value;};
>> 	
>> ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
>>
>> DOUBLE
>> 	:   ('0'..'9')+
>> 	|	('0'..'9')+ '.' ('0'..'9')* EXPONENT?
>>       |   '.' ('0'..'9')+ EXPONENT?
>>       |   ('0'..'9')+ EXPONENT
>>       ;
>>
>> fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
>>
>> NEWLINE:'\r'? '\n' { $channel = HIDDEN; };
>>
>> WS  :   (' '|'\t'|'\n'|'\r')+ { $channel = HIDDEN; };
>>
>>
>> But the problem ocurrs when, for example, i have:
>> "x x x"
>>
>> Then the parsers stop after processing the first "x".
>> ¿How do i correctly emit an invalid syntax error?.
>> I Try with the catch EarlyExitException, but it doesn't works.
>> I Want, inside my java aplicacition to catch this, and show to the final
>> user.
>> Something like this...
>> //line is equals to the user input...
>>
>>               CharStream cs = new ANTLRStringStream(line);
>>               LinearExpressionLexer lexer = new LinearExpressionLexer(cs);
>>               CommonTokenStream tokens = new CommonTokenStream(lexer);
>>               LinearExpressionParser parser = new
>> LinearExpressionParser(tokens);
>>               res = parser.linexpr (); // and here, it's suppose to fail,
>> but it isn't.
>> Actually, the linexpr does returns some kind of data whose type is a
>> custom class called LinearExpresion. I omit to put the return in the
>> linearexpr parser rule to simplify things.
>>
>> Hope anyone can help me.
>> Greettings and thanks for advance.
>
> Greetings!
>
> By design ANTLR parsers stop after consuming the longest possible VALID
> input sequence. I believe the rational for this is that any remaining
> input will be available for some other tool to process.
>
> If you want ANTLR to try to process the entire input, reporting and
> recovering from syntax errors in the input; you must tell it to do that.
>
> By referring to the EOF token (a special built-in token) in your
> top-most rule will cause ANTLR to consume the entire input string. E.g.
> the parse will not have a valid input until the EOF is seen and so will
> consume all of the input sentence.
>
> I suggest adding a top-level rule similar to:
>
> start : linexpr EOF! ;
>
> and then call parser.start() instead of parser.linexpr() in your driver.
>
> (note the ! meta-character after the EOF token above will keep the EOF
> out of any AST produced, but you do not seem to be building an AST so it
> won't make any difference...)
>
> Hope this helps...
>     -jbb
>
>
>