[antlr-interest] Greedy Token Matching?

Mihai Danila viridium at gmail.com
Wed Dec 17 09:42:39 PST 2008


Hello,

As part of a language I'm writing a parser for, I use ISO date that may have
this form (spaces added for clarity):

yyyy-MM-dd ( T hh:mm:ss (+/- HH:MM)? )?

The language also allows for dash-separated ranges of dates, like
this: date-date, or, like in DMQL, date- (meaning "date" or less).

The arrangement that seemed to best work for me specified ISO dates as
tokens. In other words, an ISO date is matched by a token, not by a rule.
This all works fine, except for expressions of the form

2008-10-21T00:00:00-2008-10-21T00:00:01

which cause the ANTLR lexer to drop the date rule when it detects the range
dash.

2008-10-21T00:00:00-

The lexer goes as far as here but then it assumes the dash is part of the
date token and chokes later on, instead of keeping the "2008-10-21T00:00:00"
portion and giving the range rule a chance to match the dash as a range
dash.

I realize that promoting tokens to rules may solve this, but I'd rather
understand what all the options are before going that route. That route
would impact the grammar in more than one way. Are there any options or
massaging that can be done at the token level?

Here's a sample grammar that parses "2008-10-21T00:00:00 -
2008-10-21T00:00:01" but not "2008-10-21T00:00:00-2008-10-21T00:00:01". Of
course, in the real world grammar, a range can contain ISO-times, not just
dates, which can further complicate matters.

start: range;
range: period '-' period;
period: ISODateTime;

fragment D: ('0'..'9');
fragment TZO: ('+' | '-') D D ':' D D;
ISODate: D D D D '-' D D '-' D D;
ISOTime: D D ':' D D ':' D D ('.' D)?;
ISODateTime: (ISODate ('T' ISOTime ('Z' | TZO)?)?);

Whitespace: ' ' { $channel = HIDDEN; };


Thanks,
Mihai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081217/99b4b256/attachment.html 


More information about the antlr-interest mailing list