[antlr-interest] Grammar help question (iCalendar)
Jin Choi
jsc at alum.mit.edu
Thu Nov 7 15:55:06 PST 2002
I'm attempting to write a grammar to parse iCalendar files (rfc2445), and have
some questions on how best to handle a grammar issue.
There are many variants of lines of the form
FIELD;param1=foo;param2="bar":field value
where the parameters are optional name value pairs, and the field value is
typed, depending on the type of field you are trying to parse. Some fields take
nearly arbitrary text, while others are limited to particular strings
("VERSION:2.0") or are structured, such as timestamps. So, I have something
like:
version : "VERSION" (params)* ':' "2.0" ;
prodid : "PRODID" (params)* ':' TEXTVALUE ;
params : ';' PARAMNAME '=' (PARAM_VALUE)? (',' PARAM_VALUE)* ;
So here's the problem: I obviously can't define a lexer rule for TEXTVALUE,
since it would create all kinds of ambiguities. Perhaps I could use a
multiplexed lexer for this, but it seems like you can only switch the lexer state
from within a lexer rule, and not from the parser, which is where you know the
type of value you should be looking for.
I could just define each component as a single token and do all the parsing in
the lexer, using protected rules. Are there any downsides to that?
How would you write this?
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list