[antlr-interest] Help with grammar keywords

Rick Mann rmann at latencyzero.com
Wed Nov 30 15:25:21 PST 2005


Hi. I was working on a grammar to parse hierarchical URIs into a URI  
with parameters. We have URIs like the following:

http://www.keepmedia.com/pubs/Esquire/
http://www.keepmedia.com/pubs/Esquire/2005
http://www.keepmedia.com/pubs/Esquire/current
http://www.keepmedia.com/pubs/Esquire/columns/Sex/
http://www.keepmedia.com/pubs/Esquire/2005/11/01/1037813
http://www.keepmedia.com/columns/Sex/


These get resolved into URIs used internally like:

/content/Content.do?pubId=19
/content/Content.do?pubId=19&itemId=1037813
/content/Content.do?pubId=19&pgv=current

For the most part, I've got a grammar that works, but one difficulty  
I'm having has to do with an optional keyword that may hang off the  
end ("current" is an example above). There are a handful of such  
special words, and they're always found at the very end of a URI.

In any case, it's not working. I've included the grammar below.  
However, it's been a few weeks since I did this, and I'm not sure  
where I left it off. Some test code I have complains bitterly. Here's  
sample output:

This works:
--------------------
Parsing: /topics/World/Europe/Western/popular
Topics path
Subtopic: World
Subtopic: Europe
Subtopic: Western
Variant: popular



This doesn't:
--------------------
Parsing: /pubs
exception: line 1:2: unexpected char: 'p'

Parsing: /pubs/
exception: line 1:2: unexpected char: 'p'

Parsing: /pubs/popular
exception: line 1:2: unexpected char: 'p'



Any suggestions would be most welcome. Sorry for the lack of more  
specific info. Thanks!


Grammar
--------------------------------


/*
URL Variants

http://localhost/pubs/
http://localhost/pubs/<pubID>
http://localhost/pubs/<pubName>
http://localhost/pubs/

http://localhost/pubs/OpinionJournal.com/suggested/
http://localhost/pubs/OpinionJournal.com/2005/03/28/727710

http://localhost/topics/

*/



class URIParser extends Parser;

options
{
	k=2;
}

path
	:
	PATHSEP!
	(	articlesPath
	|	picksPath
	|	pubsPath
	|	rfsPath
	|	topicsPath
	)
	(variant)?
	(PATHSEP!)?
	(EOL!)?
	;

protected
articlesPath
	:	"articles"			{ System.out.println("Articles path"); }
		PATHSEP!
		id:ID				{ System.out.println("Article ID: " + id.getText()); }
	;
	
protected
picksPath
	:	"picks"				{ System.out.println("Picks path"); }
		PATHSEP!
		id:ID				{ System.out.println("Picks ID: " + id.getText()); }
	;
	
protected
pubsPath
	:	PUBS!				{ System.out.println("Pubs path"); }
	(	PATHSEP!
		id:ID				{ System.out.println("Pubs ID: " + id.getText()); }
	)
	(	PATHSEP!
		articlesPath
	)?
	;
	
protected
rfsPath
	:
	(	"columns"			{ System.out.println("Columns path"); }
	|	"sections"			{ System.out.println("Sections path"); }
	|	"rfs"				{ System.out.println("rfs path"); }
	)
		PATHSEP!
		id:ID				{ System.out.println("rf ID: " + id.getText()); }
	;
	
protected
topicsPath
	:	"topics"			{ System.out.println("Topics path"); }
	(	(	PATHSEP!
			id:ID				{ System.out.println("Top topic ID: " + id.getText()); }
		)
		|
		(	PATHSEP!
			stn:STRING_LITERAL	{ System.out.println("Subtopic: " + stn.getText 
()); }
		)*
	)
	;
	
protected
variant
	:	PATHSEP!
		s:VARIANT	{ System.out.println("Variant: " + s.getText()); }
	;
	
class URILexer extends Lexer;

options
{
	k=2;
	testLiterals=true;
	filter=false;
}

	/*
tokens
{
	ARTICLES="articles";
	COLUMNS="columns";
	CURRENT="current";
	ITEMS="items";
	PICKS="picks";
	POPULAR="popular";
	PREMIUM="premium";
	PUBS="pubs";
	RFS="rfs";
	SECTIONS="sections";
	SUGGESTED="suggested";
	TOPICS="topics";
}
	*/

VARIANT
	:	COLUMNS
	|	CURRENT
	|	PICKS
	|	POPULAR
	|	PREMIUM
	|	RELATED
	|	SUGGESTED
	;


protected	COLUMNS				:	"columns";
protected	CURRENT				:	"current";
protected	PICKS				:	"picks";
protected	POPULAR				:	"popular";
protected	PREMIUM				:	"premium";
protected	RELATED				:	"related";
protected	SUGGESTED			:	"suggested";

protected	PUBS				:	"pubs";

PATHSEP
	:	(SLASH)
	;
	
ID
	:	('0'..'9')+
	;
	
STRING_LITERAL
	:	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'.')+
	;

protected
SLASH				:	'/';
EOL					:	('\r'|'\n');




More information about the antlr-interest mailing list