[antlr-interest] Grammar help question (iCalendar)

Jin Choi jsc at alum.mit.edu
Thu Nov 7 15:55:06 PST 2002


I'm attempting to write a grammar to parse iCalendar files (rfc2445), and have 
some questions on how best to handle a grammar issue.

There are many variants of lines of the form
FIELD;param1=foo;param2="bar":field value

where the parameters are optional name value pairs, and the field value is 
typed, depending on the type of field you are trying to parse. Some fields take 
nearly arbitrary text, while others are limited to particular strings 
("VERSION:2.0") or are structured, such as timestamps. So, I have something 
like:

version : "VERSION" (params)* ':' "2.0" ;
prodid : "PRODID" (params)* ':' TEXTVALUE ;
params : ';' PARAMNAME '=' (PARAM_VALUE)? (',' PARAM_VALUE)* ;

So here's the problem: I obviously can't define a lexer rule for TEXTVALUE, 
since it would create all kinds of ambiguities. Perhaps I could use a 
multiplexed lexer for this, but it seems like you can only switch the lexer state 
from within a lexer rule, and not from the parser, which is where you know the 
type of value you should be looking for.

I could just define each component as a single token and do all the parsing in 
the lexer, using protected rules. Are there any downsides to that?

How would you write this?


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list