[antlr-interest] Predicate hoisting pain
Sam Barnett-Cormack
s.barnett-cormack at lancaster.ac.uk
Mon Apr 13 08:59:23 PDT 2009
Sam Barnett-Cormack wrote:
> Jim Idle wrote:
>> Sam Barnett-Cormack wrote:
>>> Hi all,
>>>
>>> So, in my grammar I have need to re-use rules so they ultimately refer
>>> to a different rule (so I don't have to duplicate
>>> intersection/union/exception rules). I use a parameter and gated
>>> predicates, like so:
>>>
>>> elements[boolean os]
>>> : {!$os}?=>subtypeElements
>>> | {$os}?=>objectSetElements
>>> | LPAREN! elementSetSpec[$os] RPAREN!
>>> ;
>>>
>>> This is ultimately referred to from two places. The first, which
>>> generates code that's just fine, is:
>>>
>>> elementSetSpecs
>>> : rootElementSetSpec[false] (COMMA EXTMARK (COMMA
>>> additionalElementSetSpec[false])?)?
>>> -> ^(ELEMENTSET rootElementSetSpec EXTMARK? additionalElementSetSpec?)
>>> ;
>>>
>>> However, in the *slightly* more complex case:
>>>
>>> objectSetSpec
>>> : rootElementSetSpec[true] (COMMA EXTMARK
>>> additionalElementSetSpec[true]?)?
>>> | EXTMARK (COMMA additionalElementSetSpec[true])?
>>> ;
>>>
>>> The predicates get hoisted in the generated code, and then there's
>>> compile errors with undefined variable 'os'.
>>>
>>> I'm not sure why it happens in one case and not the other, and I'm even
>>> less clear on how to fix it. Can anyone help?
>>>
>>>
>> This is an FAQ basically, but you answer your own question as to why as
>> your parameter to the rule is a local parameter but the code must be/can
>> be hoisted for some decisions.
>>
>> The solution is relatively simple, but it probably isn't the correct
>> solution as your need for this indicates that you are probably going
>> wrong in the way you are constructing the parser. What you shoudl really
>> do is merge these two possibilities in the parser, then in your tree
>> walk, if you detect the use of a construct that is not valid for the
>> context, then you parser it anyway but issue a really good semantic
>> error along the lines of "Element specs like FOO cannot be used within
>> specs for BARs". If you do not do this then your users will just get
>> "Syntax error at FOO!", and unless they are already very knowledgeable
>> about the language, then they won't really know what this means.
>
>> However, remember the rules of good construction:
>>
>> 1) Anything that can be moved as an error in the lexer syntactically, to
>> a semantic error, or left to the parser, should be;
>> 2) Anything that can be moved from a syntax error in the parser to a
>> semantic error in the tree walker, should be;
>>
>> In general this means that error messages from your front end will be as
>> good as they can be:
>>
>> 1) "Unknown character '\u8290'; in the lexer becomes: "Line 20, offset
>> 42: The character 'u8290' is not a valid character for use in a variable
>> name!"
>> 2) "No viable alt at 'FOO'", becomes "Line 42, offset 22: The construct
>> FOO cannot be used within a BAR, only within a BAZ, try specifying as a
>> BARRY."
>
> So I would merge the two in the parser, and then separate them again in
> the tree parser, and then do the context-sensitive validation there? In
> this case, a user would be more likely to make a mistake that looks like
> a mixture of a valueSet and an objectSet, rather than use one in the
> place of another. They look different in any but the simplest cases
> (where all the values or objects in a set are references - ie names),
>
> However, something else in the language requires differentiation between
> valueSets and objectSets to be deferred until semantic-building time
> (when the type of the LHS of an expression is known), so I guess I'll
> have to do that. It just sticks in my craw to let the parser allow
> through something that isn't valid as *either*. However, there's a way
> around that as well... boolean flags that get set on seeing a value
> literal or object literal (the things that can't be mixed). Then a mixed
> case won't get passed. However, I suspect that might be better left to
> the semantic stage, where each element of the set can be validated,
> based on the LHS that it goes with.
Which doesn't entirely work unless I extend that concept further than I
have time to do, as there's too much ambiguity between the two possible
parse styles... *sigh*
Sam
More information about the antlr-interest
mailing list