[antlr-interest] question on greedy matching

Gerrit E.G. Hobbelt Ger.Hobbelt at bermuda-holding.com
Thu Oct 20 14:20:31 PDT 2005


From: "Ric Klaren" <ric.klaren at gmail.com>
[...]
> > Consider the following language design for a sort of a try statement:
> >
> > try statement ( catch (ID ID) '{' statement '}' )+
[...]
> > Moreover, in the example:
> >
> > try
> >      try
> >          foo();
> >      catch (E1 e) { bar(); }
> > catch (E2 e) { baz(); }
> > catch (E3 e) { baa(); }
> >
> > again the rule makes the statements bind as the indentation suggests.
[...]
> Alternatively:
>
> start out with a try rule with only one catch. Then define an extra
> alternative (probably in the statement rule to collect trailing catch
> statements) Some checks are probably necessary afterwards to see if
> everything made sense. The catch all might need a predicate to prevent
> it from being entered if the preceding statement was not a try.

Hm, I just might be saying the same thing (I don't think so, but I might've
misinterpreted that last paragraph there) when I suggest this solution, of
course assuming your meta-rule applies:



Use two rules for 'try/catch': one for an 'outer try' (which by your
definition may have multiple consecutive catch phrases) and one for 'inner
try' statement blocks, where each 'inner try' must have only one 'catch',
hence matching your meta-rule.

Excuse me, but my ANTLR is *very* rusty ATM, though my attempt at this would
probably look a bit like this:

-------------

// fluff
source: ( function )+
function: header base_statement_block;

// here it comes: inner vs. outer
base_statement_block: (no_try_statement | outer_try_stmt)+;

outer_try_stmt: 'try'
                    (no_try_statement | inner_try_stmt)+
               (catch_phrase)+; // accept one or more catch sections

inner_try_stmt: 'try'
                    (no_try_statement | inner_try_stmt)+
               catch_phrase;  // accept one and only one catch section

// and choose one of these 'catch_phrase' rules:
// alternative A for catch with mandatory curly's:
//             try statement ( catch (ID ID) '{' statement '}' )+

catch_phrase: '{' base_statement_block '}';
              // outer_try_stmt acceptable as curly's unambiguously
delineate a base_statement_block

// or this enhancement for A:
// alternative B, which _includes_ support for
//             try statement ( catch (ID ID) statement )+
// assuming the additional meta-rule defined below

catch_phrase: (no_try_statement | inner_try_stmt)+ ;
                // accept try/catch nesting in catch section, naturally ;-)

no_try_statement: '{' base_statement_block '}'
                             // curly's explicitly delineate a code section,
like in the 'C' language
                          |  // all non-'try' single statements for this
language:
                            'if' yada | 'for' walla |
whatever__you_get_the_idea;

-------------

If I'm not mistaken, this is still somewhat readable, doesn't need
predicates and is a hack which even makes

> > try statement ( catch (ID ID) statement )+

work, when using catch_phrase rule 'B' and assuming an EXTRA (2nd) meta-rule
to disambiguate this lovely language: "all statements following a 'catch'
keyword will be assumed to be part of the catch phrase until another 'catch'
statement at the same block-level is found or the end of this code section
has been reached (i.e. one of: end of function, end of block, end of
source)".

It would make

> lot of disambiguation, because how are you going to interpret:
>
> try
>       try
>           foo();
>       catch (E1 e) try bar(); catch (E3 e) zip();
> catch (E2 e) baz();

unambiguously parseable - or am I completely out of my mind by now? (again,
I assume both meta rules here. No rules? No dice.)

I expect my 'solution' to provide this parse result - curly braces added to
identify major parse tree levels:

try {                   // outer_try - level #0
      try {             // inner_try - level #1
           foo();        // no_try_stmt
       }
       catch (E1 e) { // catch_phrase @ level #1
         // 2nd meta-rule says: gobble as much as you can:
         try {          // inner_try as part of catch_phrase [i.e. level #2]
            bar();     // no_try_Stmt
         }
         catch (E3 e) { // catch_phrase @ level #2
            zip();
         }
       } // 'catch (E2 ...' is identified as second catch at level #2 -->
fallback to #1
       // ... also second catch at #1 --> fallback to #0
}
catch (E2 e) {        // catch_phrase for outer_try [level #0]
  baz();
}
// end of code section


The above should probably even work with yuck, I mean, yacc. ;-)


HTH,

Ger



More information about the antlr-interest mailing list