[antlr-interest] suggested ANTLR projects?

Matthew Ford Matthew.Ford at forward.com.au
Tue Aug 12 03:06:16 PDT 2003


I think this is a great project and I would be interested in the results.
Usually I just specify the date format the user can use.

Another small project would be complex number parsing (with good error
messages)
examples include
1
1.0
+1.0
1+i   1+j
1-i    1-j
-i
1-5.0i
j5   (perhaps)
etc
I have tried this without using Antlr and quickly gave up and went back to
Antlr to do the parsing and give useful error messages.
The trick I used for errormessages was to code into the parser some common
errors and parse them to give an explicit error message.
eg
1 5.0i
"missing sign for imaginary part"
1+5
"missing imaginary part - did you forget the 'i' or 'j' "
1j + 2
"real part should be first"      (perhaps this is too strict??)

matthew

----- Original Message ----- 
From: "Pete Forman" <pete.forman at westerngeco.com>
To: <antlr-interest at yahoogroups.com>
Sent: Tuesday, August 12, 2003 7:01 PM
Subject: Re: [antlr-interest] suggested ANTLR projects?


> At 2003-08-11 11:36 -0700, Terence Parr wrote:
> >Also, I'm going to see if I can get students to build grammars.  Can
> >people suggest grammars they want built?  They might have to describe
> >it to the students. ;)
>
> One pet grammar of mine is that of the international date and time
> format ISO 8601:2000.  Most people will have come across dates such
> as 2003-08-12 but the standard covers many other formats.  A summary
> can be found at
> http://www.iso.org/iso/en/prods-services/popstds/datesandtime.html
>
> A final draft of the standard can be found via
> http://www.qsl.net/g1smd/temp/PDF_Links.html
>
> Here is a summary of the grammar that might form the basis of a parser.
> The goal ought to be to recognize all the examples in the standard.
>
> 5 Representations
> 5.1 Explanations
> 5.1.1 Characters used in place of digits or signs: YMDwhmsn+
>    [+ should be plus_or_minus]
> 5.1.2 Characters used as designators: PRTWZDHMS
>    [D and M are used both in place of digits and as designators in
durations]
> 4.4 The space character shall not be used in the representations
>    [but a common misuse of ISO8601 uses space instead of T]
>    Lower case characters may be substituted for upper case
> 4.5 Characters used as separators: -:/#,.
>    [the FDIS is inconsistent, # is probably not used at all]
> 5.2 Dates
> 5.2.1 Calendar date
> 5.2.1.1 Complete representation
> 5.2.1.1.B: YYYYMMDD
> 5.2.1.1.E: YYYY-MM-DD
> 5.2.1.2 Representations with reduced precision
> 5.2.1.2.a.B: YYYY-MM
> 5.2.1.2.b.B: YYYY
> 5.2.1.2.c.B: YY
> 5.2.1.3 Truncated representations
> 5.2.1.3.a.B: YYMMDD
> 5.2.1.3.a.E: YY-MM-DD
> 5.2.1.3.b.B: -YYMM
> 5.2.1.3.b.E: -YY-MM
> 5.2.1.3.c.B: -YY
> 5.2.1.3.d.B: --MMDD
> 5.2.1.3.d.E: --MM-DD
> 5.2.1.3.e.B: --MM
> 5.2.1.3.f.B: ---DD
> 5.2.1.4 Expanded representations (optional, here year has 2 extra digits)
> 5.2.1.4.a.B: +YYYYYYMMDD
> 5.2.1.4.a.B: +YYYYYY-MM-DD
> 5.2.1.4.b.B: +YYYYYY-MM
> 5.2.1.4.c.B: +YYYYYY
> 5.2.1.4.d.B: +YYYY
> 5.2.2 Ordinal date
> 5.2.2.1 Complete representation
> 5.2.2.1.B: YYYYDDD
> 5.2.2.1.E: YYYY-DDD
> 5.2.2.2 Truncated representations
> 5.2.2.2.B: YYDDD
> 5.2.2.2.E: YY-DDD
> 5.2.2.3 Expanded representations (optional, here year has 2 extra digits)
> 5.2.2.3.B:  +YYYYYYDDD
> 5.2.2.3.B:  +YYYYYY-DDD
> 5.2.3 Week date
> 5.2.3.1 Complete representation
> 5.2.3.1.B: YYYYWwwD
> 5.2.3.1.E: YYYY-Www-D
> 5.2.3.2 Representation with reduced precision
> 5.2.3.2.a.B: YYYYWww
> 5.2.3.2.a.E: YYYY-Www
> 5.2.3.3 Truncated representations
> 5.2.3.3.a.B: YYWwwD
> 5.2.3.3.a.E: YY-Www-D
> 5.2.3.3.b.B: YYWww
> 5.2.3.3.b.E: YY-Www
> 5.2.3.3.c.B: -YWwwD
> 5.2.3.3.c.E: -Y-Www-D
> 5.2.3.3.d.B: -YWww
> 5.2.3.3.d.E: -Y-Www
> 5.2.3.3.e.B: -WwwD
> 5.2.3.3.e.E: -Www-D
> 5.2.3.3.f.B: -Www
> 5.2.3.3.g.B: -W-D
> 5.2.3.4 Expanded representations (optional, here year has 2 extra digits)
> 5.2.3.4.a.B: +YYYYYYWwwD
> 5.2.3.4.a.E: +YYYYYY-Www-D
> 5.2.3.4.b.B: +YYYYYYWww
> 5.2.3.4.b.E: +YYYYYY-Www
> 5.3 Time of the day
> 5.3.1 Local time of the day
> 5.3.1.1 Complete representation
> 5.3.1.1.B: hhmmss
> 5.3.1.1.E: hh:mm:ss
> 5.3.1.2 Representations with reduced precision
> 5.3.1.2.a.B: hhmm
> 5.3.1.2.a.E: hh:mm
> 5.3.1.2.b.B: hh
> 5.3.1.3 Representation of decimal fractions (may use . instead of ,)
>    (fractions shown here with two places, spec is one or more)
> 5.3.1.3.a.B: hhmmss,ss
> 5.3.1.3.a.E: hh:mm:ss,ss
> 5.3.1.3.b.B: hhmm,mm
> 5.3.1.3.b.E: hh:mm:ss,ss
> 5.3.1.3.c.B: hh,hh
> 5.3.1.4 Truncated representations
>    (fractions shown here with one place, spec is one or more)
> 5.3.1.4.a.B: -mmss
> 5.3.1.4.a.E: -mm:ss
> 5.3.1.4.b.B: -mm
> 5.3.1.4.c.B: --ss
> 5.3.1.4.d.B: -mmss,s
> 5.3.1.4.d.E: -mm:ss,s
> 5.3.1.4.e.B: -mm,m
> 5.3.1.4.f.B: --ss,s
> 5.3.1.5 Representation with time designator
>    If the time of the day is represented in basic format in a context that
does
>    not clearly identify a time only expression, the time designator [T]
> shall be
>    used immediately in front of the presentations defined in 5.3.1.1
through
>    5.3.1.3.
> 5.3.2 Midnight
>    In 5.3.1.* hh is either 00 or 24 and mm is 00.
> 5.3.3 Coordinated Universal Time (UTC)
>    To express the time of the day in Coordinated Universal Time, the
>    representations specified in 5.3.1.1 through 5.3.1.3 shall be used,
followed
>    immediately, without spaces, by the UTC designator [Z].
> 5.3.4 Local time and Coordinated Universal Time
> 5.3.4.1 Difference between local time and Coordinated Universal Time
> 5.3.4.1.a.B: +hhmm
> 5.3.4.1.a.E: +hh:mm
> 5.3.4.1.b.B: +hh
> 5.3.4.2 Local time and the difference with Coordinated Universal Time
> 5.3.1*B plus 5.3.4.1.*.B
> 5.3.1*E plus 5.3.4.1.a.E or 5.3.4.1.b.B
> 5.4 Combinations of date and time of the day
> 5.4.1 Complete representation
> 5.4.1.a: year month day timeDesignator hour minute second zoneDesignator
> 5.4.1.b: year day timeDesignator hour minute second zoneDesignator
> 5.4.1.c: year weekDesignator week day timeDesignator hour minute second
> zoneDesignator
> 5.4.2 Representations other than complete
> 5.2.* plus T plus 5.3.4.2
>    provided that
>    a) the rules specified in those sections are applied;
>    b) the resulting expression does not qualify as a complete
representation in
>       accordance with 5.4.1;
>    c) the date component shall not be represented with reduced precision
> and the
>       time component shall not be truncated. Note that this excludes the
date
>       representations in 5.2.1.3 and 5.2.3.3 that are truncated and
reduced and
>       the date representations in 5.2.1.4 and 5.2.3.4 that are expanded
and
>       reduced;
>    d) the expression shall either be completely in basic format, in which
case
>       the minimum number of separators necessary for the required
expression is
>       used, or completely in extended format, in which case additional
> separators
>       shall be used in accordance with 5.2 and 5.3.
> 5.5 Time-intervals
> 5.5.1 Means of specifying time-intervals
>    A time-interval shall be expressed in one of the following ways:
>    a) by a start and an end;
>    b) by a duration not associated with any start or end;
>    c) by a start and a duration;
>    d) by a duration and an end.
> 5.5.2 Separators and designators
>    A time interval is expressed according to the following rules:
>    a) a solidus [/] shall be used to separate the two components in each
of
> 5.5.1
>       a), c) and d).
>    b) for 5.5.1 b), c) and d) the designator [P] shall precede, without
spaces,
>       the representation of the duration.
>    c) other designators (and the hyphen when used to indicate omitted
> components)
>       shall be used as shown in 5.5.4 and 5.5.5 below.
>    NOTE In certain application areas a double hyphen is used as a
separator
>       instead of a solidus.
> 5.5.3 Representation of duration
> 5.5.3.1 Format with time-unit designators
>    In expressions of time-interval or recurring time-interval duration can
be
>    represented by a data element using time unit designators. The number
of
> years
>    shall be followed by the designator [Y], the number of months by [M],
the
>    number of weeks by [W], and the number of days by [D]. The part
> including time
>    components shall be preceded by the designator [T]; the number of hours
> shall
>    be followed by [H], the number of minutes by [M] and the number of
> seconds by
>    [S]. In the examples [n] represents one or more digits, constituting a
>    positive integer or zero.
>
>    In basic and extended format the complete representation for duration
> shall be
>    nYnMnDTnHnMnS or nW.
>
>    For reduced precision, decimal or truncated representations of this
> format the
>    following rules apply.
>    a) If necessary for a particular application the lowest order
components may
>       be omitted to represent duration with reduced precision.
>    b) If necessary for a particular application the lowest order component
may
>       have a decimal fraction. The decimal fraction shall be divided from
the
>       integer part by the decimal sign specified in ISO 31-0: i.e. the
> comma [,]
>       or full stop [.]. Of these, the comma is the preferred sign. The
decimal
>       fraction shall at least have one digit. If the magnitude of the
number is
>       less than unity, the decimal sign shall be preceded by a zero (see
ISO
>       31-0).
>    c) If the number of years, months, days, hours, minutes or seconds in
any of
>       these expressions equals zero, the number and the corresponding
> designator
>       may be absent; however, at least one number and its designator shall
be
>       present. Note that the removal of leading non-zero components is not
>       allowed.
>    d) The designator T shall be absent if all of the time components are
> absent.
> 5.5.3.2 Alternative format (optional)
> 5.5.4 Complete representations
> 5.5.4.1 Representation of time-intervals identified by start and end
> 5.4.1.* / 5.4.1.*
> 5.5.4.2 Representation of time-interval by duration only
> 5.5.4.2.1 Format with time-unit designators
> 5.5.4.2.1.a.BE: PnYnMnDTnHnMnS
> 5.5.4.2.1.b.BE: PnW
> 5.5.4.2.2 Alternative format (optional)
> 5.5.4.2.2.B: PYYYYMMDDThhmmss
> 5.5.4.2.2.E: PYYYY-MM-DDThh:mm:ss
> 5.5.4.3 Representation of time-interval identified by its start and its
> duration
> 5.5.4.3.B: 5.4.1.*.B / 5.5.3.*.B
> 5.5.4.3.E: 5.4.1.*.E / 5.5.3.*.E
> 5.5.4.4 Representation of time-interval identified by its duration and its
end
> 5.5.4.4.B: 5.5.3.*.B / 5.4.1.*.B
> 5.5.4.4.E: 5.5.3.*.E / 5.4.1.*.E
> 5.5.5 Representations other than complete
>    A representation other than complete of a time-interval shall be an
> expression
>    in accordance with 5.5.1 and 5.5.2, where time-points are represented
in
>    accordance with 5.2, 5.3 or 5.4 and where duration is represented in
>    accordance with 5.5.3.1 or 5.5.3.2, provided that:
>    a) the rules specified in those sections are applied;
>    b) the result is not a complete representation in accordance with
5.5.4, and
>    c) for which the resulting expression is either consistently in basic
format
>       or consistently in extended format;
>    d) the use of a representation needs to be agreed by the partners in
>       information interchange, if the use of any of its constituent parts
needs
>       to be agreed by the partners in information interchange.
>    In the representation of time-intervals in accordance with 5.5.1 a),
>    - if higher order components are omitted from the expression following
the
>      solidus (i.e. the representation for "end of time-interval"), it
shall be
>      assumed that the corresponding components from the "start of
> time-interval"
>      expression apply (e.g. if [YYYYMM] are omitted by using a derived
>      representation, the end of the time-interval is in the same year and
month
>      as the start of the time-interval);
>    - representations for time-zones and Coordinated Universal Time
included
> with
>      the component preceding the solidus shall be assumed to apply to the
>      component following the solidus, unless a corresponding alternative
is
>      included.
> 5.6 Recurring time-intervals
> 5.6.1 Means of specifying recurring time-intervals
>    A recurring time-interval shall be expressed in one of the following
ways:
>    a) By a number of recurrences (optional), a start and an end. This
> represents
>       a recurring time-interval of which the first time-interval is
> identified by
>       the first two components of the expression and the number of
> recurrences by
>       the last component. If the last component is absent the number of
>       occurrences is unbounded.
>    b) By a number of recurrences (optional) and a duration. This
represents a
>       recurring time interval with the indicated duration for each
> time-interval
>       and with the indicated number of recurrences. If the number of
> recurrences
>       is absent the number of occurrences is unbounded.
>    c) By a number of recurrences (optional) a start and a duration. This
>       represents a recurring time-interval of which the first
time-interval is
>       identified by the first two components of the expression and the
> number of
>       recurrences by the last component. If the last component is absent
the
>       number of occurrences is unbounded.
>    d) By a number of recurrences (optional), a duration and an end. This
>       represents a recurring time-interval of which the last time-interval
is
>       identified by the first two components of the expression and the
> number of
>       recurrences by the last component. If the last component is absent
the
>       number of occurrences is unbounded.
> 5.6.2 Separators and designators
>    All representations start with the designator [R], followed, without
spaces,
>    by the number of recurrences, if present, followed, without spaces, by
a
>    solidus [ /], followed, without spaces, by the expression of a time
interval
>    in accordance with 5.5.1. For the representation 5.6.1 a), 5.6.1 b),
> 5.6.1 c)
>    and 5.6.1 d) the time interval in accordance with 5.5.1 a), 5.5.1 b),
> 5.5.1 c)
>    and 5.5.1 d) shall be used respectively.
> 5.6.3 Complete representations
> 5.6.3: Rn / 5.5.*
>
> DIGIT:   '0'..'9';
> HYPHEN_OR_MINUS:  '-';
> COLON:   ':';
> SOLIDUS: '/';
> DECIMAL: ',' | '.';
> PERIOD:  'P' | 'p';
> RECUR:   'R' | 'r';
> TIME:    'T' | 't' | ' '; // space is illegal but commonly used
> WEEK:    'W' | 'w';
> ZULU:    'Z' | 'z';
> PLUS:    '+';
> // HYPHEN_OR_MINUS is done above
> YEAR:    'Y' | 'y';
> MONTH_OR_MINUTE: 'M' | 'm';
> // WEEK is done above
> DAY:     'D' | 'd';
> HOUR:    'H' | 'h';
> // MONTH_OR_MINUTE is done above
> SECOND:  'S' | 's';
> // HASH is probably not part of the Standard
>
>
>
> -- 
> Pete Forman                -./\.-  Disclaimer: This post is originated
> WesternGeco                  -./\.-   by myself and does not represent
> pete.forman at westerngeco.com    -./\.-   opinion of Schlumberger, Baker
> http://petef.port5.com           -./\.-   Hughes or their divisions.
>
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list