[antlr-interest] Help with grammar keywords
Rick Mann
rmann at latencyzero.com
Thu Dec 1 22:53:21 PST 2005
I thought I'd repost this since no one responded. Please let me know
what else I should post. Thanks!
Hi. I was working on a grammar to parse hierarchical URIs into a URI
with parameters. We have URIs like the following:
http://www.keepmedia.com/pubs/Esquire/
http://www.keepmedia.com/pubs/Esquire/2005
http://www.keepmedia.com/pubs/Esquire/current
http://www.keepmedia.com/pubs/Esquire/columns/Sex/
http://www.keepmedia.com/pubs/Esquire/2005/11/01/1037813
http://www.keepmedia.com/columns/Sex/
These get resolved into URIs used internally like:
/content/Content.do?pubId=19
/content/Content.do?pubId=19&itemId=1037813
/content/Content.do?pubId=19&pgv=current
For the most part, I've got a grammar that works, but one difficulty
I'm having has to do with an optional keyword that may hang off the
end ("current" is an example above). There are a handful of such
special words, and they're always found at the very end of a URI.
In any case, it's not working. I've included the grammar below.
However, it's been a few weeks since I did this, and I'm not sure
where I left it off. Some test code I have complains bitterly. Here's
sample output:
This works:
--------------------
Parsing: /topics/World/Europe/Western/popular
Topics path
Subtopic: World
Subtopic: Europe
Subtopic: Western
Variant: popular
This doesn't:
--------------------
Parsing: /pubs
exception: line 1:2: unexpected char: 'p'
Parsing: /pubs/
exception: line 1:2: unexpected char: 'p'
Parsing: /pubs/popular
exception: line 1:2: unexpected char: 'p'
Any suggestions would be most welcome. Sorry for the lack of more
specific info. Thanks!
Grammar
--------------------------------
/*
URL Variants
http://localhost/pubs/
http://localhost/pubs/<pubID>
http://localhost/pubs/<pubName>
http://localhost/pubs/
http://localhost/pubs/OpinionJournal.com/suggested/
http://localhost/pubs/OpinionJournal.com/2005/03/28/727710
http://localhost/topics/
*/
class URIParser extends Parser;
options
{
k=2;
}
path
:
PATHSEP!
( articlesPath
| picksPath
| pubsPath
| rfsPath
| topicsPath
)
(variant)?
(PATHSEP!)?
(EOL!)?
;
protected
articlesPath
: "articles" { System.out.println("Articles path"); }
PATHSEP!
id:ID { System.out.println("Article ID: " + id.getText()); }
;
protected
picksPath
: "picks" { System.out.println("Picks path"); }
PATHSEP!
id:ID { System.out.println("Picks ID: " + id.getText()); }
;
protected
pubsPath
: PUBS! { System.out.println("Pubs path"); }
( PATHSEP!
id:ID { System.out.println("Pubs ID: " + id.getText()); }
)
( PATHSEP!
articlesPath
)?
;
protected
rfsPath
:
( "columns" { System.out.println("Columns path"); }
| "sections" { System.out.println("Sections path"); }
| "rfs" { System.out.println("rfs path"); }
)
PATHSEP!
id:ID { System.out.println("rf ID: " + id.getText()); }
;
protected
topicsPath
: "topics" { System.out.println("Topics path"); }
( ( PATHSEP!
id:ID { System.out.println("Top topic ID: " + id.getText()); }
)
|
( PATHSEP!
stn:STRING_LITERAL { System.out.println("Subtopic: " + stn.getText
()); }
)*
)
;
protected
variant
: PATHSEP!
s:VARIANT { System.out.println("Variant: " + s.getText()); }
;
class URILexer extends Lexer;
options
{
k=2;
testLiterals=true;
filter=false;
}
/*
tokens
{
ARTICLES="articles";
COLUMNS="columns";
CURRENT="current";
ITEMS="items";
PICKS="picks";
POPULAR="popular";
PREMIUM="premium";
PUBS="pubs";
RFS="rfs";
SECTIONS="sections";
SUGGESTED="suggested";
TOPICS="topics";
}
*/
VARIANT
: COLUMNS
| CURRENT
| PICKS
| POPULAR
| PREMIUM
| RELATED
| SUGGESTED
;
protected COLUMNS : "columns";
protected CURRENT : "current";
protected PICKS : "picks";
protected POPULAR : "popular";
protected PREMIUM : "premium";
protected RELATED : "related";
protected SUGGESTED : "suggested";
protected PUBS : "pubs";
PATHSEP
: (SLASH)
;
ID
: ('0'..'9')+
;
STRING_LITERAL
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'.')+
;
protected
SLASH : '/';
EOL : ('\r'|'\n');
--
Rick
More information about the antlr-interest
mailing list