[antlr-interest] Catching errors

Wed Feb 2 18:22:24 PST 2011

Okey. So adding and EOF forces the parser to go to the end of the input 
in search of others tokens in correct order.

1)But a still have a problem, consider the following grammar:

grammar LinearMath;

tokens
{
     PLUS     = '+';
     MINUS     = '-';
     MUL        = '*';
     DIV        = '/';
}

inecuation:	linexpr ((RELATIONSHIP) linexpr)+ EOF!;
catch [UnwantedTokenException ute]
{
	System.out.println ("inecuation UnwantedTokenException  " + 
ute.toString());
	throw ute;
}

linexpr : (MINUS|PLUS)? linterm ((PLUS|MINUS) linterm)* EOF;

linterm : factor? ID;

expr returns [double value]
     : e=term {$value = $e.value;}
     (    PLUS e=term {$value += $e.value;}
     |    MINUS e=term {$value -= $e.value;}
     )*;

term returns [double value]
     : f=factor {$value = $f.value;}
     (    MUL f=factor {$value *= $f.value;}
     |    DIV f=factor {$value /= $f.value;}
     )*;

factor returns [double value]
     : DOUBLE {$value = Double.parseDouble($DOUBLE.text);}
     | '(' e=expr ')'{$value = $e.value;};

ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;

DOUBLE
     :   ('0'..'9')+
     |    ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
       |   '.' ('0'..'9')+ EXPONENT?
       |   ('0'..'9')+ EXPONENT
       ;

fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

NEWLINE:'\r'? '\n' { $channel = HIDDEN; };

WS  :   (' '|'\t'|'\n'|'\r')+ { $channel = HIDDEN; };

RELATIONSHIP :	'<'|'<='|'='|'>'|'>=';

and with the following input: "x < y x"
that isn't a valid inecuation beacause the y x must have a binary 
aritmetic operator (PLUS OR MINUS). The parser do his job very well, he 
consume the "x" then "<" later "y" and when it reachs the seconds "x" it 
emits an "UnwantedTokenException". The think is, that i am not being 
able to catch it, and display an error to the final user. Look that i am 
using to parse that input the inecuation "rule".

Hope anyone can help me with this again.

2) Other thing is about invalid tokens, i manage to treat then 
overriding a member function of the lexer called nextToken(), like this:

@lexer::members
{
	@Override
	public Token nextToken()
	{
		while (true) {
			state.token = null;
			state.channel = Token.DEFAULT_CHANNEL;
			state.tokenStartCharIndex = input.index();
			state.tokenStartCharPositionInLine = input.getCharPositionInLine();
			state.tokenStartLine = input.getLine();
			state.text = null;
			if ( input.LA(1)==CharStream.EOF ) {
				return Token.EOF_TOKEN;
			}
			try {
				mTokens();
				if ( state.token==null ) {
					emit();
				}
				else if ( state.token==Token.SKIP_TOKEN ) {
					continue;
				}
				return state.token;
			}
			catch (RecognitionException re) {
				reportError(re);
				throw new RuntimeException("Invalid Character  : " + (char) (re.c)); 
// or throw Error
			}
		}
	}
}

¿It's that the correct way?

Well that is all!!!
Thanks for advance!.
Victor!!

El 02/02/2011 05:32 p.m., John B. Brodie escribió:
> Your grammar does not mention the EOF token. (more below...)
> On Wed, 2011-02-02 at 16:18 -0300, Victor Giordano wrote:
>> Hi there. I am having trouble with the error handling.
>> I have a grammar for recoignize linear expression. And it works great!.
>> The grammar for a linear expresion is the following:
>>
>> tokens
>> {
>> 	PLUS 	= '+';
>> 	MINUS 	= '-';
>> 	MUL		= '*';
>> 	DIV		= '/';
>> }
>>
>> linexpr : (MINUS|PLUS)? linterm ((PLUS|MINUS) linterm)*;
>> linterm : factor? ID;
>>
>> expr returns [double value]
>> 	: e=term {$value = $e.value;}
>> 	(	PLUS e=term {$value += $e.value;}
>> 	|	MINUS e=term {$value -= $e.value;}
>> 	)*;
>>
>> term returns [double value]
>> 	: f=factor {$value = $f.value;}
>> 	(	MUL f=factor {$value *= $f.value;}
>> 	|	DIV f=factor {$value /= $f.value;}
>> 	)*;
>>
>> factor returns [double value]
>> 	: DOUBLE {$value = Double.parseDouble($DOUBLE.text);}
>> 	| '(' e=expr ')'{$value = $e.value;};
>> 	
>> ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
>>
>> DOUBLE
>> 	:   ('0'..'9')+
>> 	|	('0'..'9')+ '.' ('0'..'9')* EXPONENT?
>>       |   '.' ('0'..'9')+ EXPONENT?
>>       |   ('0'..'9')+ EXPONENT
>>       ;
>>
>> fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
>>
>> NEWLINE:'\r'? '\n' { $channel = HIDDEN; };
>>
>> WS  :   (' '|'\t'|'\n'|'\r')+ { $channel = HIDDEN; };
>>
>>
>> But the problem ocurrs when, for example, i have:
>> "x x x"
>>
>> Then the parsers stop after processing the first "x".
>> ¿How do i correctly emit an invalid syntax error?.
>> I Try with the catch EarlyExitException, but it doesn't works.
>> I Want, inside my java aplicacition to catch this, and show to the final
>> user.
>> Something like this...
>> //line is equals to the user input...
>>
>>               CharStream cs = new ANTLRStringStream(line);
>>               LinearExpressionLexer lexer = new LinearExpressionLexer(cs);
>>               CommonTokenStream tokens = new CommonTokenStream(lexer);
>>               LinearExpressionParser parser = new
>> LinearExpressionParser(tokens);
>>               res = parser.linexpr (); // and here, it's suppose to fail,
>> but it isn't.
>> Actually, the linexpr does returns some kind of data whose type is a
>> custom class called LinearExpresion. I omit to put the return in the
>> linearexpr parser rule to simplify things.
>>
>> Hope anyone can help me.
>> Greettings and thanks for advance.
>
> Greetings!
>
> By design ANTLR parsers stop after consuming the longest possible VALID
> input sequence. I believe the rational for this is that any remaining
> input will be available for some other tool to process.
>
> If you want ANTLR to try to process the entire input, reporting and
> recovering from syntax errors in the input; you must tell it to do that.
>
> By referring to the EOF token (a special built-in token) in your
> top-most rule will cause ANTLR to consume the entire input string. E.g.
> the parse will not have a valid input until the EOF is seen and so will
> consume all of the input sentence.
>
> I suggest adding a top-level rule similar to:
>
> start : linexpr EOF! ;
>
> and then call parser.start() instead of parser.linexpr() in your driver.
>
> (note the ! meta-character after the EOF token above will keep the EOF
> out of any AST produced, but you do not seem to be building an AST so it
> won't make any difference...)
>
> Hope this helps...
>     -jbb
>
>
>