[antlr-interest] Predicate hoisting pain

Jim Idle jimi at temporal-wave.com
Mon Apr 13 08:01:26 PDT 2009


Sam Barnett-Cormack wrote:
> Hi all,
>
> So, in my grammar I have need to re-use rules so they ultimately refer 
> to a different rule (so I don't have to duplicate 
> intersection/union/exception rules). I use a parameter and gated 
> predicates, like so:
>
> elements[boolean os]
>    : {!$os}?=>subtypeElements
>    | {$os}?=>objectSetElements
>    | LPAREN! elementSetSpec[$os] RPAREN!
>    ;
>
> This is ultimately referred to from two places. The first, which 
> generates code that's just fine, is:
>
> elementSetSpecs
>    : rootElementSetSpec[false] (COMMA EXTMARK (COMMA 
> additionalElementSetSpec[false])?)?
>    -> ^(ELEMENTSET rootElementSetSpec EXTMARK? additionalElementSetSpec?)
>    ;
>
> However, in the *slightly* more complex case:
>
> objectSetSpec
>    : rootElementSetSpec[true] (COMMA EXTMARK 
> additionalElementSetSpec[true]?)?
>    | EXTMARK (COMMA additionalElementSetSpec[true])?
>    ;
>
> The predicates get hoisted in the generated code, and then there's 
> compile errors with undefined variable 'os'.
>
> I'm not sure why it happens in one case and not the other, and I'm even 
> less clear on how to fix it. Can anyone help?
>
>   
This is an FAQ basically, but you answer your own question as to why as 
your parameter to the rule is a local parameter but the code must be/can 
be hoisted for some decisions.

The solution is relatively simple, but it probably isn't the correct 
solution as your need for this indicates that you are probably going 
wrong in the way you are constructing the parser. What you shoudl really 
do is merge these two possibilities in the parser, then in your tree 
walk, if you detect the use of a construct that is not valid for the 
context, then you parser it anyway but issue a really good semantic 
error along the lines of "Element specs like FOO cannot be used within 
specs for BARs". If you do not do this then your users will just get 
"Syntax error at FOO!", and unless they are already very knowledgeable 
about the language, then they won't really know what this means.

However, as you can obviously distinguish the cases at some point higher 
up the rule chain, then if you wish to pursue this, then all you need do 
is create a scope with your flag in it at a high enough level, init it 
to the default case, then set/unset it as the rules descend, then use it 
as the gated predicate in your rule above:

highuprule
    scope
     { boolean os; }
    @init { $highuprule::os = false; }
: rule rule rule ... ;

...

ruleX :  X  Y (Z { $highuprule::os = true; }  objectSetSpec)? // Z 
present means flip the flag
;

objectSetSpec
   : {$highuprule:os}?=>additionalSetSpec
   | something else
   ;


Because scopes are globally available to the parsing context, the 
histing has no effect on the locality of the parameter.

However, remember the rules of good construction:

1) Anything that can be moved as an error in the lexer syntactically, to 
a semantic error, or left to the parser, should be;
2) Anything that can be moved from a syntax error in the parser to a 
semantic error in the tree walker, should be;

In general this means that error messages from your front end will be as 
good as they can be:

1) "Unknown character '\u8290'; in the lexer becomes: "Line 20, offset 
42: The character 'u8290' is not a valid character for use in a variable 
name!"
2) "No viable alt at 'FOO'", becomes "Line 42, offset 22: The construct 
FOO cannot be used within a BAR, only within a BAZ, try specifying as a 
BARRY."

and so on.

Jim


More information about the antlr-interest mailing list