[antlr-interest] Rule for ISO 8601 - Duration Standar

Tue Dec 16 10:37:03 PST 2008

At 00:52 17/12/2008, Javier Gomez Escribano wrote:
>I'm trying to parse a "duration string" which is in format ISO 
>8601:
>PnnYnnMnnDTnnHnnMnnS (P1Y0M27DT11H9M11S)
>where nn are numbers.
>(A reference to ISO 8601 for durations... 
>http://en.wikipedia.org/wiki/ISO_8601)
>
>The problem is that I have to put a white space between letters 
>and numbers. And it is something I'd like to avoid.
>
>Here is part of my code
>
>duration    :    (P_TOKEN (year=number Y_TOKEN)? (month=number 
>M_TOKEN)? (day=number D_TOKEN)?)?(T_TOKEN (hour=number H_TOKEN)? 
>(min=number M_TOKEN)? (sec=number S_TOKEN)?)? -> ^(YEAR $year)? 
>^(MONTH $month)? ^(DAY $day)? ^(HOUR $hour)? ^(MINUTE $min)? 
>^(SECOND $sec)?    ;

You should make this a lexer rule, to recognise the entire thing 
as a single token.  (Trying to use invididual letters as tokens 
like you're doing here quickly leads to complications.)

After that you can still split it up again in the parser for AST 
generation; just manually parse it at that point.  This will also 
let you produce better error messages in the cases of bad 
structure or out of range values.