[antlr-interest] use of semantic predicates and hoisting

Tue Nov 23 09:52:40 PST 2010

You need to not use predicates at all then and just accept ID, or list all
the keywords in the lexer but use a rule that accepts them as ID when they
are allowed. Then check the ID in a later semantic pass. The golden rule is
to push checking and errors as far back as possible. That way your error
messages make more sense and are part of you semantic checks and not your
syntactic checks.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Arthur Goldberg
> Sent: Monday, November 22, 2010 4:23 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] use of semantic predicates and hoisting
> 
> All
> 
> I've built a grammar that uses a couple of sets of keywords in multiple
> places.
> they're called dataTypeNames and dataTypeLevels (they're actually
> genetic measurement data types, and levels for discrete values).
> the grammar works -- ANTLR is cool -- but I'm having trouble making
> satisfactory error messages.
> 
> Here's elided versions of some key rules.
> 
> dataTypeSpec
>      :
>      dataTypeName
>      | dataTypeLevel
>      | discreteDataType
>      ;
> 
> discreteDataType
>          :
>          ( dataTypeName comparisonOP dataTypeLevel ) |
>          ( dataTypeName SIGNED_INT )
>          ;
> 
> dataTypeName
>      :
>      { DataTypeSpecEnumerations.isDataTypeName( input.LT(1).getText())
> }?
>      ID
>      ;
> 
> dataTypeLevel
>      :
>      {
> DataTypeSpecEnumerations.isDataTypeLevel(input.LT(1).getText())}?
>      ID
>      ;
> 
> comparisonOP
>      :    COMPARISON_OP
>          {
>   // ACTION: convert to enumeration
>   $theComparisonOp = ComparisonOp.convertCode( $COMPARISON_OP.text );
>          }
>      ;
> 
> COMPARISON_OP
>      // awkward to convert to enumeration in COMPARISON_OP cuz of char
> / text distinction for 1/longer tokens; see bottom p. 139 T. Parr
>      : ( '<=' | '<' | '>' | '>=' )
>      ;
> 
> ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
>          ;
> 
> SIGNED_INT : ('-')?    '0'..'9'+ ;
> 
> DataTypeSpecEnumerations.isDataTypeName and
> DataTypeSpecEnumerations.isDataTypeLevel indicate whether a String is a
> good dataTypeName or dataTypeLevel, respectively. those functions are a
> little complex, so they cannot be hard-coded in the lexer.
> the parser does recognize properly well-formed dataTypeSpecs. but when
> the input is wrong, i want to be able to report errors like <token> is
> not a valid <dataTypeName> or <token> is not a valid <dataTypeLevel>.
> (given that dataTypeName and dataTypeLevel are each just an ID, the
> same token may get reported multiple times. that's OK.) my thought was
> to override String
> org.antlr.runtime.BaseRecognizer.getErrorMessage     (
> RecognitionException      e, String[]      tokenNames     ) and report
> errors when e is a FailedPredicateException.
> but to my surprise, bad dataTypeNames or dataTypeLevels don't generate
> FailedPredicateException, because they're hoisted into dataTypeSpec.
> what's a good way to handle this?
> i don't want to combine dataTypeName and dataTypeLevel into a single
> production, because they're used in different places.
> the predicates must go before the IDs, or otherwise dataTypeSpec won't
> compile.
> is it possible to turn off hoisting?
> 
> Thanks
> Arthur
> 
> --
> Senior Research Scientist
> Computational Biology
> Memorial Sloan-Kettering Cancer Center
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address