[antlr-interest] Parser behavior on invalid input - Newbee question
bchagenbuch
bhagenbuch at didera.com
Fri Mar 28 13:23:57 PST 2003
I encountered a puzzling (to a newbee like me) behavior while writing a rule to parse compound names of lengths 1 to 3, like "a", "a.b", or "a.b.c". I tried 3 different formulations:
The first one has the disadvantage of recognizing names that are too long, but is fine otherwise.
name1 : ID ( DOT ID )* { my_action(); } ;
The other two recognize exactly what I want:
name2 : ID ( DOT ID ( DOT ID )? )? { my_action(); } ;
name3 : ID ( part1 | ) { my_action(); } ;
part1 : DOT ID ( part2 | ) ;
part2 : DOT ID ;
On certain invalid inputs, however, my_action() isn't executed, and I think it should be.
In particular, when name2 or name3 sees "a.b.c BAD-STUFF",
- my_action() is executed, then
- the bad stuff causes an exception, as expected,
BUT
when name2 or name3 sees "a.b BAD-STUFF" or "a BAD-STUFF",
- the bad stuff causes an exception, and my_action() is skipped.
When I look at the generated parser, I can see why this occurs, but when I look at the rules in the grammar, I can't.
What am I missing?
Thanks in advance.
Brian
-------- A complete grammar showing the behavior follows --------
class P extends Parser;
// Try the various start rules with the following inputs:
// 1. "(a)", "(a.b)", "(a.b.c)" // ok
// 2. "(a.b.c.d)" // ok
// 3. "(a x)", "(a.b x)" // surprise! only start1 triggers action.
// 4. "(a.b.c x)" // ok
// 5. "(a.b.c.d x)" // ok
// My expectation was that all the inputs would trigger the action
// of printing a message, though input sets 2-5 should throw an
// exception thereafter. (Or, in the case of start1, sets 3-5)
// The surprise is that start2 and start3 don't trigger the action
// on the inputs in set 3.
options { k = 2; }
start1 : OPEN name1 CLOSE
;
name1 : ID ( DOT ID )*
{ System.out.println("Saw a (maybe too long) name."); }
;
start2 : OPEN name2 CLOSE
;
name2 : ID ( DOT ID ( DOT ID )? )?
{ System.out.println("Saw a name."); }
;
start3 : OPEN name3 CLOSE
;
name3 : ID ( part1 | )
{ System.out.println("Saw a name."); }
;
protected
part1 : DOT ID ( part2 | )
;
protected
part2 : DOT ID
;
class L extends Lexer;
options { k = 2; }
ID : ('a'..'z')
;
DOT : '.'
;
OPEN : '('
;
CLOSE : ')'
;
WS : ( ' '|'\t'
| ("\r\n"|'\r'|'\n')
{newline();}
)
{ $setType(Token.SKIP); }
;
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list