[antlr-interest] Catching errors
Victor Giordano
power_giordo at yahoo.com.ar
Wed Feb 2 18:22:24 PST 2011
Okey. So adding and EOF forces the parser to go to the end of the input
in search of others tokens in correct order.
1)But a still have a problem, consider the following grammar:
grammar LinearMath;
tokens
{
PLUS = '+';
MINUS = '-';
MUL = '*';
DIV = '/';
}
inecuation: linexpr ((RELATIONSHIP) linexpr)+ EOF!;
catch [UnwantedTokenException ute]
{
System.out.println ("inecuation UnwantedTokenException " +
ute.toString());
throw ute;
}
linexpr : (MINUS|PLUS)? linterm ((PLUS|MINUS) linterm)* EOF;
linterm : factor? ID;
expr returns [double value]
: e=term {$value = $e.value;}
( PLUS e=term {$value += $e.value;}
| MINUS e=term {$value -= $e.value;}
)*;
term returns [double value]
: f=factor {$value = $f.value;}
( MUL f=factor {$value *= $f.value;}
| DIV f=factor {$value /= $f.value;}
)*;
factor returns [double value]
: DOUBLE {$value = Double.parseDouble($DOUBLE.text);}
| '(' e=expr ')'{$value = $e.value;};
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
DOUBLE
: ('0'..'9')+
| ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
| '.' ('0'..'9')+ EXPONENT?
| ('0'..'9')+ EXPONENT
;
fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
NEWLINE:'\r'? '\n' { $channel = HIDDEN; };
WS : (' '|'\t'|'\n'|'\r')+ { $channel = HIDDEN; };
RELATIONSHIP : '<'|'<='|'='|'>'|'>=';
and with the following input: "x < y x"
that isn't a valid inecuation beacause the y x must have a binary
aritmetic operator (PLUS OR MINUS). The parser do his job very well, he
consume the "x" then "<" later "y" and when it reachs the seconds "x" it
emits an "UnwantedTokenException". The think is, that i am not being
able to catch it, and display an error to the final user. Look that i am
using to parse that input the inecuation "rule".
Hope anyone can help me with this again.
2) Other thing is about invalid tokens, i manage to treat then
overriding a member function of the lexer called nextToken(), like this:
@lexer::members
{
@Override
public Token nextToken()
{
while (true) {
state.token = null;
state.channel = Token.DEFAULT_CHANNEL;
state.tokenStartCharIndex = input.index();
state.tokenStartCharPositionInLine = input.getCharPositionInLine();
state.tokenStartLine = input.getLine();
state.text = null;
if ( input.LA(1)==CharStream.EOF ) {
return Token.EOF_TOKEN;
}
try {
mTokens();
if ( state.token==null ) {
emit();
}
else if ( state.token==Token.SKIP_TOKEN ) {
continue;
}
return state.token;
}
catch (RecognitionException re) {
reportError(re);
throw new RuntimeException("Invalid Character : " + (char) (re.c));
// or throw Error
}
}
}
}
¿It's that the correct way?
Well that is all!!!
Thanks for advance!.
Victor!!
El 02/02/2011 05:32 p.m., John B. Brodie escribió:
> Your grammar does not mention the EOF token. (more below...)
> On Wed, 2011-02-02 at 16:18 -0300, Victor Giordano wrote:
>> Hi there. I am having trouble with the error handling.
>> I have a grammar for recoignize linear expression. And it works great!.
>> The grammar for a linear expresion is the following:
>>
>> tokens
>> {
>> PLUS = '+';
>> MINUS = '-';
>> MUL = '*';
>> DIV = '/';
>> }
>>
>> linexpr : (MINUS|PLUS)? linterm ((PLUS|MINUS) linterm)*;
>> linterm : factor? ID;
>>
>> expr returns [double value]
>> : e=term {$value = $e.value;}
>> ( PLUS e=term {$value += $e.value;}
>> | MINUS e=term {$value -= $e.value;}
>> )*;
>>
>> term returns [double value]
>> : f=factor {$value = $f.value;}
>> ( MUL f=factor {$value *= $f.value;}
>> | DIV f=factor {$value /= $f.value;}
>> )*;
>>
>> factor returns [double value]
>> : DOUBLE {$value = Double.parseDouble($DOUBLE.text);}
>> | '(' e=expr ')'{$value = $e.value;};
>>
>> ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
>>
>> DOUBLE
>> : ('0'..'9')+
>> | ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
>> | '.' ('0'..'9')+ EXPONENT?
>> | ('0'..'9')+ EXPONENT
>> ;
>>
>> fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
>>
>> NEWLINE:'\r'? '\n' { $channel = HIDDEN; };
>>
>> WS : (' '|'\t'|'\n'|'\r')+ { $channel = HIDDEN; };
>>
>>
>> But the problem ocurrs when, for example, i have:
>> "x x x"
>>
>> Then the parsers stop after processing the first "x".
>> ¿How do i correctly emit an invalid syntax error?.
>> I Try with the catch EarlyExitException, but it doesn't works.
>> I Want, inside my java aplicacition to catch this, and show to the final
>> user.
>> Something like this...
>> //line is equals to the user input...
>>
>> CharStream cs = new ANTLRStringStream(line);
>> LinearExpressionLexer lexer = new LinearExpressionLexer(cs);
>> CommonTokenStream tokens = new CommonTokenStream(lexer);
>> LinearExpressionParser parser = new
>> LinearExpressionParser(tokens);
>> res = parser.linexpr (); // and here, it's suppose to fail,
>> but it isn't.
>> Actually, the linexpr does returns some kind of data whose type is a
>> custom class called LinearExpresion. I omit to put the return in the
>> linearexpr parser rule to simplify things.
>>
>> Hope anyone can help me.
>> Greettings and thanks for advance.
>
> Greetings!
>
> By design ANTLR parsers stop after consuming the longest possible VALID
> input sequence. I believe the rational for this is that any remaining
> input will be available for some other tool to process.
>
> If you want ANTLR to try to process the entire input, reporting and
> recovering from syntax errors in the input; you must tell it to do that.
>
> By referring to the EOF token (a special built-in token) in your
> top-most rule will cause ANTLR to consume the entire input string. E.g.
> the parse will not have a valid input until the EOF is seen and so will
> consume all of the input sentence.
>
> I suggest adding a top-level rule similar to:
>
> start : linexpr EOF! ;
>
> and then call parser.start() instead of parser.linexpr() in your driver.
>
> (note the ! meta-character after the EOF token above will keep the EOF
> out of any AST produced, but you do not seem to be building an AST so it
> won't make any difference...)
>
> Hope this helps...
> -jbb
>
>
>
More information about the antlr-interest
mailing list