[antlr-interest] Parser behavior on invalid input - Newbee question

bchagenbuch bhagenbuch at didera.com
Fri Mar 28 13:23:57 PST 2003


I encountered a puzzling (to a newbee like me) behavior while writing a rule to parse compound names of lengths 1 to 3, like "a", "a.b", or "a.b.c".  I tried 3 different formulations:

The first one has the disadvantage of recognizing names that are too long, but is fine otherwise.

	name1 : ID ( DOT ID )* { my_action(); } ;


The other two recognize exactly what I want:

	name2 : ID ( DOT ID ( DOT ID )? )? { my_action(); } ;

	name3 : ID ( part1 | ) { my_action(); } ;
	part1 : DOT ID ( part2 | ) ;
	part2 : DOT ID ;

On certain invalid inputs, however, my_action() isn't executed, and I think it should be. 

In particular, when name2 or name3 sees "a.b.c BAD-STUFF", 
	- my_action() is executed, then 
	- the bad stuff causes an exception, as expected,

	BUT
	
when name2 or name3 sees "a.b BAD-STUFF" or "a BAD-STUFF", 
	- the bad stuff causes an exception, and my_action() is skipped. 

When I look at the generated parser, I can see why this occurs, but when I look at the rules in the grammar, I can't.

What am I missing?

Thanks in advance.
Brian

-------- A complete grammar showing the behavior follows --------

class P extends Parser;

// Try the various start rules with the following inputs:
 
// 1.   "(a)", "(a.b)", "(a.b.c)"  // ok
// 2.   "(a.b.c.d)"  // ok
// 3.   "(a x)", "(a.b x)" // surprise! only start1 triggers action.
// 4.   "(a.b.c x)" // ok
// 5.   "(a.b.c.d x)" // ok

// My expectation was that all the inputs would trigger the action
// of printing a message, though input sets 2-5 should throw an 
// exception thereafter. (Or, in the case of start1, sets 3-5)

// The surprise is that start2 and start3 don't trigger the action
// on the inputs in set 3.


options { k = 2; }

start1	: OPEN name1 CLOSE
		;

name1	: ID ( DOT ID )*  
			{ System.out.println("Saw a (maybe too long) name."); } 
		;
		

start2	: OPEN name2 CLOSE
		;

name2	: ID ( DOT ID ( DOT ID )? )? 
			{ System.out.println("Saw a name."); }
		;

start3	: OPEN name3 CLOSE
		;

name3	: ID ( part1 | )
			{ System.out.println("Saw a name."); }
		;

protected
part1	: DOT ID ( part2 |  )
		;

protected
part2	: DOT ID 
		;
		

class L extends Lexer;

options { k = 2; }

ID		: ('a'..'z')
		;

DOT		: '.'
		;

OPEN	: '('
		;

CLOSE	: ')'
		;

WS		: ( ' '|'\t'
		  | ("\r\n"|'\r'|'\n') 
		    	{newline();}
		  ) 
		  	{ $setType(Token.SKIP); }
		;



 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list