[antlr-interest] use of semantic predicates and hoisting

Mon Nov 22 16:22:35 PST 2010

All

I've built a grammar that uses a couple of sets of keywords in multiple 
places.
they're called dataTypeNames and dataTypeLevels (they're actually 
genetic measurement data types, and levels for discrete values).
the grammar works -- ANTLR is cool -- but I'm having trouble making 
satisfactory error messages.

Here's elided versions of some key rules.

dataTypeSpec
     :
     dataTypeName
     | dataTypeLevel
     | discreteDataType
     ;

discreteDataType
         :
         ( dataTypeName comparisonOP dataTypeLevel ) |
         ( dataTypeName SIGNED_INT )
         ;

dataTypeName
     :
     { DataTypeSpecEnumerations.isDataTypeName( input.LT(1).getText()) }?
     ID
     ;

dataTypeLevel
     :
     { DataTypeSpecEnumerations.isDataTypeLevel(input.LT(1).getText())}?
     ID
     ;

comparisonOP
     :    COMPARISON_OP
         {
  // ACTION: convert to enumeration
  $theComparisonOp = ComparisonOp.convertCode( $COMPARISON_OP.text );
         }
     ;

COMPARISON_OP
     // awkward to convert to enumeration in COMPARISON_OP cuz of char / 
text distinction for 1/longer tokens; see bottom p. 139 T. Parr
     : ( '<=' | '<' | '>' | '>=' )
     ;

ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
         ;

SIGNED_INT : ('-')?    '0'..'9'+ ;

DataTypeSpecEnumerations.isDataTypeName and 
DataTypeSpecEnumerations.isDataTypeLevel indicate whether a String is a 
good dataTypeName or dataTypeLevel, respectively. those functions are a 
little complex, so they cannot be hard-coded in the lexer.
the parser does recognize properly well-formed dataTypeSpecs. but when 
the input is wrong, i want to be able to report errors like
<token> is not a valid <dataTypeName> or <token> is not a valid 
<dataTypeLevel>.
(given that dataTypeName and dataTypeLevel are each just an ID, the same 
token may get reported multiple times. that's OK.)
my thought was to override String 
org.antlr.runtime.BaseRecognizer.getErrorMessage     ( 
RecognitionException      e, String[]      tokenNames     ) and report 
errors when e is a FailedPredicateException.
but to my surprise, bad dataTypeNames or dataTypeLevels don't generate 
FailedPredicateException, because they're hoisted into dataTypeSpec. 
what's a good way to handle this?
i don't want to combine dataTypeName and dataTypeLevel into a single 
production, because they're used in different places.
the predicates must go before the IDs, or otherwise dataTypeSpec won't 
compile.
is it possible to turn off hoisting?

Thanks
Arthur

-- 
Senior Research Scientist
Computational Biology
Memorial Sloan-Kettering Cancer Center