[antlr-interest] Semantic Predicates in a Lexer

Fri Mar 20 12:52:12 PDT 2009

At 07:02 21/03/2009, Paul Bouché (NSN) wrote:
>NUMBER
>     :
>     {noColonInNames || !noColonInNames}?=>
>     DIGIT_+
>     ;

That predicate is completely redundant.

>NAME
>     : {!noColonInNames}?=> (LETTERORDIGIT_ | COLON)+
>     | {noColonInNames }?=> LETTERORDIGIT_+ { 
> $type = SIMPLENAME; }
>     | {noColonInNames }?=> COLON { $type = 
> COLON; } { noColonInNames = false;}
>     ;
>
>If I then set the noColonInNames to true for the input
>a          :     b    a    3    a3   3a
>I get
>SIMPLENAME COLON NAME NAME NAME NAME NAME
>which is not what I want, I want the 3 to be 
>recognized as a NUMBER as it works without 
>predicates. Imo there is a bug or I cannot 
>understand why it does not work.

Given the input "3", both NAME and NUMBER are 
viable output rules since both consume the exact 
same input.  ANTLR should therefore choose 
whichever one is listed first (which ought to be 
NUMBER in this case, unless you've listed the 
rules out-of-order).  If you don't want to be at 
the mercy of this sort of thing then you should 
modify the NAME rule so that a NAME is not 
permitted to begin with a digit.

But given your example input and what you've said 
thus far, I think you're overcomplicating 
things.  If there are no whitespace limitations 
between the name fragments and the colon (ie. 
whitespace is permitted and ignored), then what 
you really should do is to remove the NAME rule 
entirely and just have the lexer emit SIMPLENAME, 
COLON, and NUMBER.  Then up in the parser you can 
define a 'name' rule that recognises SIMPLENAME 
COLON SIMPLENAME (or whatever) as a single logical unit.