[antlr-interest] suggested ANTLR projects?
Pete Forman
pete.forman at westerngeco.com
Tue Aug 12 02:01:34 PDT 2003
At 2003-08-11 11:36 -0700, Terence Parr wrote:
>Also, I'm going to see if I can get students to build grammars. Can
>people suggest grammars they want built? They might have to describe
>it to the students. ;)
One pet grammar of mine is that of the international date and time
format ISO 8601:2000. Most people will have come across dates such
as 2003-08-12 but the standard covers many other formats. A summary
can be found at
http://www.iso.org/iso/en/prods-services/popstds/datesandtime.html
A final draft of the standard can be found via
http://www.qsl.net/g1smd/temp/PDF_Links.html
Here is a summary of the grammar that might form the basis of a parser.
The goal ought to be to recognize all the examples in the standard.
5 Representations
5.1 Explanations
5.1.1 Characters used in place of digits or signs: YMDwhmsn+
[+ should be plus_or_minus]
5.1.2 Characters used as designators: PRTWZDHMS
[D and M are used both in place of digits and as designators in durations]
4.4 The space character shall not be used in the representations
[but a common misuse of ISO8601 uses space instead of T]
Lower case characters may be substituted for upper case
4.5 Characters used as separators: -:/#,.
[the FDIS is inconsistent, # is probably not used at all]
5.2 Dates
5.2.1 Calendar date
5.2.1.1 Complete representation
5.2.1.1.B: YYYYMMDD
5.2.1.1.E: YYYY-MM-DD
5.2.1.2 Representations with reduced precision
5.2.1.2.a.B: YYYY-MM
5.2.1.2.b.B: YYYY
5.2.1.2.c.B: YY
5.2.1.3 Truncated representations
5.2.1.3.a.B: YYMMDD
5.2.1.3.a.E: YY-MM-DD
5.2.1.3.b.B: -YYMM
5.2.1.3.b.E: -YY-MM
5.2.1.3.c.B: -YY
5.2.1.3.d.B: --MMDD
5.2.1.3.d.E: --MM-DD
5.2.1.3.e.B: --MM
5.2.1.3.f.B: ---DD
5.2.1.4 Expanded representations (optional, here year has 2 extra digits)
5.2.1.4.a.B: +YYYYYYMMDD
5.2.1.4.a.B: +YYYYYY-MM-DD
5.2.1.4.b.B: +YYYYYY-MM
5.2.1.4.c.B: +YYYYYY
5.2.1.4.d.B: +YYYY
5.2.2 Ordinal date
5.2.2.1 Complete representation
5.2.2.1.B: YYYYDDD
5.2.2.1.E: YYYY-DDD
5.2.2.2 Truncated representations
5.2.2.2.B: YYDDD
5.2.2.2.E: YY-DDD
5.2.2.3 Expanded representations (optional, here year has 2 extra digits)
5.2.2.3.B: +YYYYYYDDD
5.2.2.3.B: +YYYYYY-DDD
5.2.3 Week date
5.2.3.1 Complete representation
5.2.3.1.B: YYYYWwwD
5.2.3.1.E: YYYY-Www-D
5.2.3.2 Representation with reduced precision
5.2.3.2.a.B: YYYYWww
5.2.3.2.a.E: YYYY-Www
5.2.3.3 Truncated representations
5.2.3.3.a.B: YYWwwD
5.2.3.3.a.E: YY-Www-D
5.2.3.3.b.B: YYWww
5.2.3.3.b.E: YY-Www
5.2.3.3.c.B: -YWwwD
5.2.3.3.c.E: -Y-Www-D
5.2.3.3.d.B: -YWww
5.2.3.3.d.E: -Y-Www
5.2.3.3.e.B: -WwwD
5.2.3.3.e.E: -Www-D
5.2.3.3.f.B: -Www
5.2.3.3.g.B: -W-D
5.2.3.4 Expanded representations (optional, here year has 2 extra digits)
5.2.3.4.a.B: +YYYYYYWwwD
5.2.3.4.a.E: +YYYYYY-Www-D
5.2.3.4.b.B: +YYYYYYWww
5.2.3.4.b.E: +YYYYYY-Www
5.3 Time of the day
5.3.1 Local time of the day
5.3.1.1 Complete representation
5.3.1.1.B: hhmmss
5.3.1.1.E: hh:mm:ss
5.3.1.2 Representations with reduced precision
5.3.1.2.a.B: hhmm
5.3.1.2.a.E: hh:mm
5.3.1.2.b.B: hh
5.3.1.3 Representation of decimal fractions (may use . instead of ,)
(fractions shown here with two places, spec is one or more)
5.3.1.3.a.B: hhmmss,ss
5.3.1.3.a.E: hh:mm:ss,ss
5.3.1.3.b.B: hhmm,mm
5.3.1.3.b.E: hh:mm:ss,ss
5.3.1.3.c.B: hh,hh
5.3.1.4 Truncated representations
(fractions shown here with one place, spec is one or more)
5.3.1.4.a.B: -mmss
5.3.1.4.a.E: -mm:ss
5.3.1.4.b.B: -mm
5.3.1.4.c.B: --ss
5.3.1.4.d.B: -mmss,s
5.3.1.4.d.E: -mm:ss,s
5.3.1.4.e.B: -mm,m
5.3.1.4.f.B: --ss,s
5.3.1.5 Representation with time designator
If the time of the day is represented in basic format in a context that does
not clearly identify a time only expression, the time designator [T]
shall be
used immediately in front of the presentations defined in 5.3.1.1 through
5.3.1.3.
5.3.2 Midnight
In 5.3.1.* hh is either 00 or 24 and mm is 00.
5.3.3 Coordinated Universal Time (UTC)
To express the time of the day in Coordinated Universal Time, the
representations specified in 5.3.1.1 through 5.3.1.3 shall be used, followed
immediately, without spaces, by the UTC designator [Z].
5.3.4 Local time and Coordinated Universal Time
5.3.4.1 Difference between local time and Coordinated Universal Time
5.3.4.1.a.B: +hhmm
5.3.4.1.a.E: +hh:mm
5.3.4.1.b.B: +hh
5.3.4.2 Local time and the difference with Coordinated Universal Time
5.3.1*B plus 5.3.4.1.*.B
5.3.1*E plus 5.3.4.1.a.E or 5.3.4.1.b.B
5.4 Combinations of date and time of the day
5.4.1 Complete representation
5.4.1.a: year month day timeDesignator hour minute second zoneDesignator
5.4.1.b: year day timeDesignator hour minute second zoneDesignator
5.4.1.c: year weekDesignator week day timeDesignator hour minute second
zoneDesignator
5.4.2 Representations other than complete
5.2.* plus T plus 5.3.4.2
provided that
a) the rules specified in those sections are applied;
b) the resulting expression does not qualify as a complete representation in
accordance with 5.4.1;
c) the date component shall not be represented with reduced precision
and the
time component shall not be truncated. Note that this excludes the date
representations in 5.2.1.3 and 5.2.3.3 that are truncated and reduced and
the date representations in 5.2.1.4 and 5.2.3.4 that are expanded and
reduced;
d) the expression shall either be completely in basic format, in which case
the minimum number of separators necessary for the required expression is
used, or completely in extended format, in which case additional
separators
shall be used in accordance with 5.2 and 5.3.
5.5 Time-intervals
5.5.1 Means of specifying time-intervals
A time-interval shall be expressed in one of the following ways:
a) by a start and an end;
b) by a duration not associated with any start or end;
c) by a start and a duration;
d) by a duration and an end.
5.5.2 Separators and designators
A time interval is expressed according to the following rules:
a) a solidus [/] shall be used to separate the two components in each of
5.5.1
a), c) and d).
b) for 5.5.1 b), c) and d) the designator [P] shall precede, without spaces,
the representation of the duration.
c) other designators (and the hyphen when used to indicate omitted
components)
shall be used as shown in 5.5.4 and 5.5.5 below.
NOTE In certain application areas a double hyphen is used as a separator
instead of a solidus.
5.5.3 Representation of duration
5.5.3.1 Format with time-unit designators
In expressions of time-interval or recurring time-interval duration can be
represented by a data element using time unit designators. The number of
years
shall be followed by the designator [Y], the number of months by [M], the
number of weeks by [W], and the number of days by [D]. The part
including time
components shall be preceded by the designator [T]; the number of hours
shall
be followed by [H], the number of minutes by [M] and the number of
seconds by
[S]. In the examples [n] represents one or more digits, constituting a
positive integer or zero.
In basic and extended format the complete representation for duration
shall be
nYnMnDTnHnMnS or nW.
For reduced precision, decimal or truncated representations of this
format the
following rules apply.
a) If necessary for a particular application the lowest order components may
be omitted to represent duration with reduced precision.
b) If necessary for a particular application the lowest order component may
have a decimal fraction. The decimal fraction shall be divided from the
integer part by the decimal sign specified in ISO 31-0: i.e. the
comma [,]
or full stop [.]. Of these, the comma is the preferred sign. The decimal
fraction shall at least have one digit. If the magnitude of the number is
less than unity, the decimal sign shall be preceded by a zero (see ISO
31-0).
c) If the number of years, months, days, hours, minutes or seconds in any of
these expressions equals zero, the number and the corresponding
designator
may be absent; however, at least one number and its designator shall be
present. Note that the removal of leading non-zero components is not
allowed.
d) The designator T shall be absent if all of the time components are
absent.
5.5.3.2 Alternative format (optional)
5.5.4 Complete representations
5.5.4.1 Representation of time-intervals identified by start and end
5.4.1.* / 5.4.1.*
5.5.4.2 Representation of time-interval by duration only
5.5.4.2.1 Format with time-unit designators
5.5.4.2.1.a.BE: PnYnMnDTnHnMnS
5.5.4.2.1.b.BE: PnW
5.5.4.2.2 Alternative format (optional)
5.5.4.2.2.B: PYYYYMMDDThhmmss
5.5.4.2.2.E: PYYYY-MM-DDThh:mm:ss
5.5.4.3 Representation of time-interval identified by its start and its
duration
5.5.4.3.B: 5.4.1.*.B / 5.5.3.*.B
5.5.4.3.E: 5.4.1.*.E / 5.5.3.*.E
5.5.4.4 Representation of time-interval identified by its duration and its end
5.5.4.4.B: 5.5.3.*.B / 5.4.1.*.B
5.5.4.4.E: 5.5.3.*.E / 5.4.1.*.E
5.5.5 Representations other than complete
A representation other than complete of a time-interval shall be an
expression
in accordance with 5.5.1 and 5.5.2, where time-points are represented in
accordance with 5.2, 5.3 or 5.4 and where duration is represented in
accordance with 5.5.3.1 or 5.5.3.2, provided that:
a) the rules specified in those sections are applied;
b) the result is not a complete representation in accordance with 5.5.4, and
c) for which the resulting expression is either consistently in basic format
or consistently in extended format;
d) the use of a representation needs to be agreed by the partners in
information interchange, if the use of any of its constituent parts needs
to be agreed by the partners in information interchange.
In the representation of time-intervals in accordance with 5.5.1 a),
- if higher order components are omitted from the expression following the
solidus (i.e. the representation for "end of time-interval"), it shall be
assumed that the corresponding components from the "start of
time-interval"
expression apply (e.g. if [YYYYMM] are omitted by using a derived
representation, the end of the time-interval is in the same year and month
as the start of the time-interval);
- representations for time-zones and Coordinated Universal Time included
with
the component preceding the solidus shall be assumed to apply to the
component following the solidus, unless a corresponding alternative is
included.
5.6 Recurring time-intervals
5.6.1 Means of specifying recurring time-intervals
A recurring time-interval shall be expressed in one of the following ways:
a) By a number of recurrences (optional), a start and an end. This
represents
a recurring time-interval of which the first time-interval is
identified by
the first two components of the expression and the number of
recurrences by
the last component. If the last component is absent the number of
occurrences is unbounded.
b) By a number of recurrences (optional) and a duration. This represents a
recurring time interval with the indicated duration for each
time-interval
and with the indicated number of recurrences. If the number of
recurrences
is absent the number of occurrences is unbounded.
c) By a number of recurrences (optional) a start and a duration. This
represents a recurring time-interval of which the first time-interval is
identified by the first two components of the expression and the
number of
recurrences by the last component. If the last component is absent the
number of occurrences is unbounded.
d) By a number of recurrences (optional), a duration and an end. This
represents a recurring time-interval of which the last time-interval is
identified by the first two components of the expression and the
number of
recurrences by the last component. If the last component is absent the
number of occurrences is unbounded.
5.6.2 Separators and designators
All representations start with the designator [R], followed, without spaces,
by the number of recurrences, if present, followed, without spaces, by a
solidus [ /], followed, without spaces, by the expression of a time interval
in accordance with 5.5.1. For the representation 5.6.1 a), 5.6.1 b),
5.6.1 c)
and 5.6.1 d) the time interval in accordance with 5.5.1 a), 5.5.1 b),
5.5.1 c)
and 5.5.1 d) shall be used respectively.
5.6.3 Complete representations
5.6.3: Rn / 5.5.*
DIGIT: '0'..'9';
HYPHEN_OR_MINUS: '-';
COLON: ':';
SOLIDUS: '/';
DECIMAL: ',' | '.';
PERIOD: 'P' | 'p';
RECUR: 'R' | 'r';
TIME: 'T' | 't' | ' '; // space is illegal but commonly used
WEEK: 'W' | 'w';
ZULU: 'Z' | 'z';
PLUS: '+';
// HYPHEN_OR_MINUS is done above
YEAR: 'Y' | 'y';
MONTH_OR_MINUTE: 'M' | 'm';
// WEEK is done above
DAY: 'D' | 'd';
HOUR: 'H' | 'h';
// MONTH_OR_MINUTE is done above
SECOND: 'S' | 's';
// HASH is probably not part of the Standard
--
Pete Forman -./\.- Disclaimer: This post is originated
WesternGeco -./\.- by myself and does not represent
pete.forman at westerngeco.com -./\.- opinion of Schlumberger, Baker
http://petef.port5.com -./\.- Hughes or their divisions.
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list