[antlr-interest] ANTLR NUB

Jan Nielsen jan.sture.nielsen at gmail.com
Mon Jan 21 22:25:51 PST 2008


Hi Gavin and Andy,

Thanks for the input! Per your suggestions, I modified my grammar (and
DSL a bit) and the resulting parser now passes these test cases:

 "from 1/January/2008"
 "from 1/January/2008 to 1/January/2009"
 "from 1/January/2008 to 1/January/2009 exclude 1/January/2008"
 "from 1/January/2008 to 1/January/2009 exclude 21/January/2008"
 "from 1/January/2008 to 1/January/2009 exclude Thursday-Sunday"
 "from 1/January/2008 to 1/January/2009 exclude Monday, Wednesday, Friday"
 "from 1/January/2008 to 1/January/2009 exclude Thursday[4]/November"
 "from 1/January/2008 to 1/January/2009 exclude Thursday-Sunday
include June-July"
 "from 1/January/2008 to 1/January/2009 exclude Monday-Thursday
include 21/January/2008"
 "from 1/January exclude 1/January"
 "from 1/January exclude 21/January"
 "from 1/January exclude Thursday-Sunday"
 "from 1/January exclude Monday, Wednesday, Friday"
 "from 1/January exclude Thursday[4]/November"
 "from 1/January exclude Thursday-Sunday include June-July"
 "from 1/January exclude Monday-Thursday include 21/January/2008"

I initially envisioned having repeated exclusion and inclusion clauses
but I don't think I need to support it now; I'll probably have a go at
it once I get the parser and tie-ins working.

Thanks, again, for your help.


-Jan


grammar T;

options{
    output = AST;
    ASTLabelType = CommonTree;
}

prog
    : 'from' date ('to' date)? exclude_clause? include_clause?
    ;

date
    : day_of_month '/' MONTH ('/' year)?
    ;

exclude_clause
    : 'exclude' period (',' period)*
    ;

include_clause
    : 'include' period (',' period)*
    ;

period
    : day_of_month_period
    | day_of_week_period
    | month_period
    ;

day_of_month_period
    : date ('-' date)?
    ;

day_of_week_period
    : DAY_OF_WEEK ('[' occurrence ']')? ('-' DAY_OF_WEEK)?
    ;

month_period
    : MONTH ('-' MONTH)?
    ;

occurrence
    : NUMBER
    ;

year
    : NUMBER
    ;

MONTH
    : 'January'
    | 'February'
    | 'March'
    | 'April'
    | 'May'
    | 'June'
    | 'July'
    | 'August'
    | 'September'
    | 'October'
    | 'November'
    | 'December'
    ;

day_of_month
    : NUMBER
    ;

DAY_OF_WEEK
    : 'Monday'
    | 'Tuesday'
    | 'Wednesday'
    | 'Thursday'
    | 'Friday'
    | 'Saturday'
    | 'Sunday'
    ;

NUMBER
    : ('0'..'9')+
    ;

WS  :  (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;}
    ;

COMMENT
    :   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

LINE_COMMENT
    : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    ;


On Jan 21, 2008 12:31 PM, Gavin Lambert <antlr at mirality.co.nz> wrote:
> At 06:32 22/01/2008, Jan Nielsen wrote:
>  >
>  >Here are a few examples of valid expressions:
>  >
>  > "from 1/January/2008"
>  > "from 1/January/2008 to 1/January/2009"
>  > "from 1/January/2008 to 1/January/2009 excluding
> 21/January/2008"
>  > "from 1/January/2008 to 1/January/2009 excluding
> 21/January/2008"
>  > "from 1/January/2008 to 1/January/2009 excluding
> Thursday-Sunday"
>  > "from 1/January/2008 to 1/January/2009 excluding
> Thursday-Sunday
>  >including June-July"
>  > "from 1/January/2008 to 1/January/2009 excluding
> Monday-Thursday
>  >including 21/January/2008"
>  > "from 1/January/2008 to 1/January/2009 excluding
> Monday-Thursday
>  >including 'Dr. Martin Luther King Day'"
>  >
>  >A "including" after an "excluding", i.e., to the right of,
>  >overrides the exclusion.
>
> Is it permitted to have repeated clauses?  ie. "from X including A
> excluding B including C"?
>
>  >prog
>  >    : 'from' date ('to' date)?
>  >      ('including' period)? (',' period)*
>  >      ('excluding' period)? (',' period)*
>  >    ;
>
> This enforces an order between "including" and "excluding"; one
> which doesn't match your examples above.  At minimum to get the
> examples to work (and assuming repeated clauses are not permitted)
> you'll need to reverse these.
>
> Also your scoping on the comma-separated bits is wrong; this
> should be inside the optional clause (otherwise it doesn't make
> much sense).  So:
>
> prog
>    : 'from' date ('to' date)? excluding_clause? including_clause?
>    ;
>
> excluding_clause
>    : 'excluding' period (',' period)*
>    ;
>
> including_clause
>    : 'including' period (',' period)*
>    ;
>
>  >day_of_month_period
>  >    : DAY_OF_MONTH (MONTH)? (YEAR)?
>  >    ;
>  >
>  >day_of_week_period
>  >    : DAY_OF_WEEK ('[' OCCURRENCE ']')? (YEAR)?
>  >    ;
>
> Shouldn't these have slashes?  You're also not covering other
> types of constructs permitted by your examples.
>
>  >OCCURRENCE
>  >    : '1'..'4'
>  >    ;
>  >
>  >YEAR
>  >    : '1'..'9' '0'..'9' '0'..'9' '0'..'9'
>  >    ;
> [...]
>  >DAY_OF_MONTH : '1'..'9' | '1'..'2' '0'..'9' | '30' | '31';
>
> You can't do this.  The most important thing to remember is that
> all lexing is done up front with no input from the parser (since
> the parser doesn't even exist yet).  Thus any non-fragment tokens
> become independent candidates for output.  Facing a '3' in the
> input stream, it could match any one of these rules; given no
> clear preference ANTLR will choose the first listed and generate
> an OCCURRENCE token, which you're not accepting when it finally
> does reach the parser.
>
> At the lexer level you should just recognise basic integers, and
> then validate them based on context in the parser:
>
> NUMBER: ('0'..'9')+;
>
> occurrence: n=NUMBER { validateOccurrence($n.text); }?;
> year: n=NUMBER { validateYear($n.text); }?;
> day_of_month: n=NUMBER { validateDayOfMonth($n.text); }?;
>
>  > But once I have a parser for my expression, how do I actually
>  > use the parser to implement my API???
>
> There are two common ways to do this.  One is to output an AST
> from the parser, which in your case could end up looking something
> like this (expressed in string form):
>    ^(FROM ^(DATE 1 January 2008) ^(DATE 1 January 2009)
> ^(EXCLUDING ^(DAY ^(RANGE Monday Thursday))) ^(INCLUDING ^(DAY
> "Dr. Martin Luther King Day"))
>
> (Of course the exact syntax is variable; you can put in what you
> want for the most part, although a certain structure will be
> dictated by how the rules are organised.)  Then you can just write
> tree-walking code to call your various API functions as
> appropriate.  (Or even write a tree parser, though that's usually
> unnecessary.)
>
> Another approach is to simply include the action code as you are
> parsing.  For example:
>
> excluding_clause
>    : 'excluding' p=period { addExclusion($p.result); }
>      (',' p=period { addExclusion($p.result); } )*
>    ;
>
> Of course for this to work, you'll need to also enhance the period
> and date rules with 'returns' clauses, which create a data
> structure that describes what they have just recognised.
>
>


More information about the antlr-interest mailing list