[antlr-interest] ANTLR NUB
Gavin Lambert
antlr at mirality.co.nz
Mon Jan 21 11:31:30 PST 2008
At 06:32 22/01/2008, Jan Nielsen wrote:
>
>Here are a few examples of valid expressions:
>
> "from 1/January/2008"
> "from 1/January/2008 to 1/January/2009"
> "from 1/January/2008 to 1/January/2009 excluding
21/January/2008"
> "from 1/January/2008 to 1/January/2009 excluding
21/January/2008"
> "from 1/January/2008 to 1/January/2009 excluding
Thursday-Sunday"
> "from 1/January/2008 to 1/January/2009 excluding
Thursday-Sunday
>including June-July"
> "from 1/January/2008 to 1/January/2009 excluding
Monday-Thursday
>including 21/January/2008"
> "from 1/January/2008 to 1/January/2009 excluding
Monday-Thursday
>including 'Dr. Martin Luther King Day'"
>
>A "including" after an "excluding", i.e., to the right of,
>overrides the exclusion.
Is it permitted to have repeated clauses? ie. "from X including A
excluding B including C"?
>prog
> : 'from' date ('to' date)?
> ('including' period)? (',' period)*
> ('excluding' period)? (',' period)*
> ;
This enforces an order between "including" and "excluding"; one
which doesn't match your examples above. At minimum to get the
examples to work (and assuming repeated clauses are not permitted)
you'll need to reverse these.
Also your scoping on the comma-separated bits is wrong; this
should be inside the optional clause (otherwise it doesn't make
much sense). So:
prog
: 'from' date ('to' date)? excluding_clause? including_clause?
;
excluding_clause
: 'excluding' period (',' period)*
;
including_clause
: 'including' period (',' period)*
;
>day_of_month_period
> : DAY_OF_MONTH (MONTH)? (YEAR)?
> ;
>
>day_of_week_period
> : DAY_OF_WEEK ('[' OCCURRENCE ']')? (YEAR)?
> ;
Shouldn't these have slashes? You're also not covering other
types of constructs permitted by your examples.
>OCCURRENCE
> : '1'..'4'
> ;
>
>YEAR
> : '1'..'9' '0'..'9' '0'..'9' '0'..'9'
> ;
[...]
>DAY_OF_MONTH : '1'..'9' | '1'..'2' '0'..'9' | '30' | '31';
You can't do this. The most important thing to remember is that
all lexing is done up front with no input from the parser (since
the parser doesn't even exist yet). Thus any non-fragment tokens
become independent candidates for output. Facing a '3' in the
input stream, it could match any one of these rules; given no
clear preference ANTLR will choose the first listed and generate
an OCCURRENCE token, which you're not accepting when it finally
does reach the parser.
At the lexer level you should just recognise basic integers, and
then validate them based on context in the parser:
NUMBER: ('0'..'9')+;
occurrence: n=NUMBER { validateOccurrence($n.text); }?;
year: n=NUMBER { validateYear($n.text); }?;
day_of_month: n=NUMBER { validateDayOfMonth($n.text); }?;
> But once I have a parser for my expression, how do I actually
> use the parser to implement my API???
There are two common ways to do this. One is to output an AST
from the parser, which in your case could end up looking something
like this (expressed in string form):
^(FROM ^(DATE 1 January 2008) ^(DATE 1 January 2009)
^(EXCLUDING ^(DAY ^(RANGE Monday Thursday))) ^(INCLUDING ^(DAY
"Dr. Martin Luther King Day"))
(Of course the exact syntax is variable; you can put in what you
want for the most part, although a certain structure will be
dictated by how the rules are organised.) Then you can just write
tree-walking code to call your various API functions as
appropriate. (Or even write a tree parser, though that's usually
unnecessary.)
Another approach is to simply include the action code as you are
parsing. For example:
excluding_clause
: 'excluding' p=period { addExclusion($p.result); }
(',' p=period { addExclusion($p.result); } )*
;
Of course for this to work, you'll need to also enhance the period
and date rules with 'returns' clauses, which create a data
structure that describes what they have just recognised.
More information about the antlr-interest
mailing list