[antlr-interest] Greedy Token Matching?

Sam Harwell sharwell at pixelminegames.com
Wed Dec 17 09:51:54 PST 2008


Add a synpred to the ISODateTime rule:

 

ISODateTime:  (ISODate ('T' ISOTime ('Z' | (TZO) => TZO)?)?);

 

The lexer hits the range dash and eats it, entering the TZO fragment.
I'm not sure whether or not this is the intended behavior since if TZO
and DASH were top-level lexer rules, it would only enter TZO if
LA(4)==':'.

 

Sam

 

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Mihai Danila
Sent: Wednesday, December 17, 2008 11:43 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Greedy Token Matching?

 

 

Hello,

 

As part of a language I'm writing a parser for, I use ISO date that may
have this form (spaces added for clarity):

 

yyyy-MM-dd ( T hh:mm:ss (+/- HH:MM)? )?

 

The language also allows for dash-separated ranges of dates, like this:
date-date, or, like in DMQL, date- (meaning "date" or less).

 

The arrangement that seemed to best work for me specified ISO dates as
tokens. In other words, an ISO date is matched by a token, not by a
rule. This all works fine, except for expressions of the form

 

2008-10-21T00:00:00-2008-10-21T00:00:01

 

which cause the ANTLR lexer to drop the date rule when it detects the
range dash.

 

2008-10-21T00:00:00-

 

The lexer goes as far as here but then it assumes the dash is part of
the date token and chokes later on, instead of keeping the
"2008-10-21T00:00:00" portion and giving the range rule a chance to
match the dash as a range dash.

 

I realize that promoting tokens to rules may solve this, but I'd rather
understand what all the options are before going that route. That route
would impact the grammar in more than one way. Are there any options or
massaging that can be done at the token level?

 

Here's a sample grammar that parses "2008-10-21T00:00:00 -
2008-10-21T00:00:01" but not "2008-10-21T00:00:00-2008-10-21T00:00:01".
Of course, in the real world grammar, a range can contain ISO-times, not
just dates, which can further complicate matters.

 

start:        range;

range:        period '-' period;

period:       ISODateTime;

 

fragment D:   ('0'..'9');

fragment TZO: ('+' | '-') D D ':' D D;

ISODate:  D D D D '-' D D '-' D D;

ISOTime:  D D ':' D D ':' D D ('.' D)?;

ISODateTime:  (ISODate ('T' ISOTime ('Z' | TZO)?)?);

 

Whitespace:   ' ' { $channel = HIDDEN; };

 

 

Thanks,

Mihai

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081217/626803d7/attachment.html 


More information about the antlr-interest mailing list