[antlr-interest] Parser behavior on invalid input - Newbee qu estion

mzukowski at yci.com mzukowski at yci.com
Mon Mar 31 14:27:36 PST 2003


You need to learn about lookahead.  Every "|" means a decision needs to be
made and antlr will use lookahead to do that.  That's what gets put into the
if statments to decide what to do next.  The reasoning sort of goes, "if I
know there's bad stuff ahead, I'll complain right now."

Perhaps check out Ter's recent lectures for his class (see antlr.org).  He
must have covered lookahead by now....

Monty

-----Original Message-----
From: bchagenbuch [mailto:bhagenbuch at didera.com]
Sent: Friday, March 28, 2003 1:24 PM
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] Parser behavior on invalid input - Newbee
question


I encountered a puzzling (to a newbee like me) behavior while writing a rule
to parse compound names of lengths 1 to 3, like "a", "a.b", or "a.b.c".  I
tried 3 different formulations:

The first one has the disadvantage of recognizing names that are too long,
but is fine otherwise.

	name1 : ID ( DOT ID )* { my_action(); } ;


The other two recognize exactly what I want:

	name2 : ID ( DOT ID ( DOT ID )? )? { my_action(); } ;

	name3 : ID ( part1 | ) { my_action(); } ;
	part1 : DOT ID ( part2 | ) ;
	part2 : DOT ID ;

On certain invalid inputs, however, my_action() isn't executed, and I think
it should be. 

In particular, when name2 or name3 sees "a.b.c BAD-STUFF", 
	- my_action() is executed, then 
	- the bad stuff causes an exception, as expected,

	BUT
	
when name2 or name3 sees "a.b BAD-STUFF" or "a BAD-STUFF", 
	- the bad stuff causes an exception, and my_action() is skipped. 

When I look at the generated parser, I can see why this occurs, but when I
look at the rules in the grammar, I can't.

What am I missing?

Thanks in advance.
Brian

-------- A complete grammar showing the behavior follows --------

class P extends Parser;

// Try the various start rules with the following inputs:
 
// 1.   "(a)", "(a.b)", "(a.b.c)"  // ok
// 2.   "(a.b.c.d)"  // ok
// 3.   "(a x)", "(a.b x)" // surprise! only start1 triggers action.
// 4.   "(a.b.c x)" // ok
// 5.   "(a.b.c.d x)" // ok

// My expectation was that all the inputs would trigger the action
// of printing a message, though input sets 2-5 should throw an 
// exception thereafter. (Or, in the case of start1, sets 3-5)

// The surprise is that start2 and start3 don't trigger the action
// on the inputs in set 3.


options { k = 2; }

start1	: OPEN name1 CLOSE
		;

name1	: ID ( DOT ID )*  
			{ System.out.println("Saw a (maybe too long)
name."); } 
		;
		

start2	: OPEN name2 CLOSE
		;

name2	: ID ( DOT ID ( DOT ID )? )? 
			{ System.out.println("Saw a name."); }
		;

start3	: OPEN name3 CLOSE
		;

name3	: ID ( part1 | )
			{ System.out.println("Saw a name."); }
		;

protected
part1	: DOT ID ( part2 |  )
		;

protected
part2	: DOT ID 
		;
		

class L extends Lexer;

options { k = 2; }

ID		: ('a'..'z')
		;

DOT		: '.'
		;

OPEN	: '('
		;

CLOSE	: ')'
		;

WS		: ( ' '|'\t'
		  | ("\r\n"|'\r'|'\n') 
		    	{newline();}
		  ) 
		  	{ $setType(Token.SKIP); }
		;



 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list