[antlr-interest] Re: Grammar help question (iCalendar)

Mon Nov 11 14:15:00 PST 2002

It turns out to be remarkably difficult (for me, anyway) to figure out how to write 
a grammar specification for iCalendar in ANTLR (and in JavaCC, which I've 
also looked into).

As described below, iCalendar has a lot of places where you have something 
like

parameter: FIELDNAME lots_of_stuff TYPEDVALUE

where the value must correspond to a particular type based on the field name. 
You can't just switch off of what it looks like (20021111T183000 looks like a 
datetime stamp, but it might be just a funny looking free text field). You can do 
this by treating every value as plain text and dealing with it semantically 
afterwards, but then what's the point of using a parser.

I'm looking into hand-writing a recursive descent parser for iCalendar; please 
contact me if you make progress with ANTLR or have any suggestions.

On Monday, November 11, 2002, at 03:13 PM, george_hastings wrote:

I just began looking at creating an ANTLR-based parser for iCalendar.
I want to work with iCalendar as XML (xCal), but in order to be fully
useful, I need to bridge to iCal. Perhaps we can cooperate on the
development?

G.H.
george_hastings at yahoo.com

--- In antlr-interest at y..., "Jin Choi" <jsc at a...> wrote:
I'm attempting to write a grammar to parse iCalendar files
(rfc2445), and have
some questions on how best to handle a grammar issue.

There are many variants of lines of the form
FIELD;param1=foo;param2="bar":field value

where the parameters are optional name value pairs, and the field
value is
typed, depending on the type of field you are trying to parse. Some
fields take
nearly arbitrary text, while others are limited to particular strings
("VERSION:2.0") or are structured, such as timestamps. So, I have
something
like:

version : "VERSION" (params)* ':' "2.0" ;
prodid : "PRODID" (params)* ':' TEXTVALUE ;
params : ';' PARAMNAME '=' (PARAM_VALUE)? (',' PARAM_VALUE)* ;

So here's the problem: I obviously can't define a lexer rule for
TEXTVALUE,
since it would create all kinds of ambiguities. Perhaps I could use a
multiplexed lexer for this, but it seems like you can only switch
the lexer state
from within a lexer rule, and not from the parser, which is where
you know the
type of value you should be looking for.

I could just define each component as a single token and do all the
parsing in
the lexer, using protected rules. Are there any downsides to that?

How would you write this?

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/