[antlr-interest] Predicate hoisting pain

Sam Barnett-Cormack s.barnett-cormack at lancaster.ac.uk
Mon Apr 13 08:25:22 PDT 2009


Jim Idle wrote:
> Sam Barnett-Cormack wrote:
>> Hi all,
>>
>> So, in my grammar I have need to re-use rules so they ultimately refer 
>> to a different rule (so I don't have to duplicate 
>> intersection/union/exception rules). I use a parameter and gated 
>> predicates, like so:
>>
>> elements[boolean os]
>>    : {!$os}?=>subtypeElements
>>    | {$os}?=>objectSetElements
>>    | LPAREN! elementSetSpec[$os] RPAREN!
>>    ;
>>
>> This is ultimately referred to from two places. The first, which 
>> generates code that's just fine, is:
>>
>> elementSetSpecs
>>    : rootElementSetSpec[false] (COMMA EXTMARK (COMMA 
>> additionalElementSetSpec[false])?)?
>>    -> ^(ELEMENTSET rootElementSetSpec EXTMARK? additionalElementSetSpec?)
>>    ;
>>
>> However, in the *slightly* more complex case:
>>
>> objectSetSpec
>>    : rootElementSetSpec[true] (COMMA EXTMARK 
>> additionalElementSetSpec[true]?)?
>>    | EXTMARK (COMMA additionalElementSetSpec[true])?
>>    ;
>>
>> The predicates get hoisted in the generated code, and then there's 
>> compile errors with undefined variable 'os'.
>>
>> I'm not sure why it happens in one case and not the other, and I'm even 
>> less clear on how to fix it. Can anyone help?
>>
>>   
> This is an FAQ basically, but you answer your own question as to why as 
> your parameter to the rule is a local parameter but the code must be/can 
> be hoisted for some decisions.
> 
> The solution is relatively simple, but it probably isn't the correct 
> solution as your need for this indicates that you are probably going 
> wrong in the way you are constructing the parser. What you shoudl really 
> do is merge these two possibilities in the parser, then in your tree 
> walk, if you detect the use of a construct that is not valid for the 
> context, then you parser it anyway but issue a really good semantic 
> error along the lines of "Element specs like FOO cannot be used within 
> specs for BARs". If you do not do this then your users will just get 
> "Syntax error at FOO!", and unless they are already very knowledgeable 
> about the language, then they won't really know what this means.

> However, remember the rules of good construction:
> 
> 1) Anything that can be moved as an error in the lexer syntactically, to 
> a semantic error, or left to the parser, should be;
> 2) Anything that can be moved from a syntax error in the parser to a 
> semantic error in the tree walker, should be;
> 
> In general this means that error messages from your front end will be as 
> good as they can be:
> 
> 1) "Unknown character '\u8290'; in the lexer becomes: "Line 20, offset 
> 42: The character 'u8290' is not a valid character for use in a variable 
> name!"
> 2) "No viable alt at 'FOO'", becomes "Line 42, offset 22: The construct 
> FOO cannot be used within a BAR, only within a BAZ, try specifying as a 
> BARRY."

So I would merge the two in the parser, and then separate them again in 
the tree parser, and then do the context-sensitive validation there? In 
this case, a user would be more likely to make a mistake that looks like 
a mixture of a valueSet and an objectSet, rather than use one in the 
place of another. They look different in any but the simplest cases 
(where all the values or objects in a set are references - ie names),

However, something else in the language requires differentiation between 
valueSets and objectSets to be deferred until semantic-building time 
(when the type of the LHS of an expression is known), so I guess I'll 
have to do that. It just sticks in my craw to let the parser allow 
through something that isn't valid as *either*. However, there's a way 
around that as well... boolean flags that get set on seeing a value 
literal or object literal (the things that can't be mixed). Then a mixed 
case won't get passed. However, I suspect that might be better left to 
the semantic stage, where each element of the set can be validated, 
based on the LHS that it goes with.

Sam



More information about the antlr-interest mailing list