[antlr-interest] Semantic Predicates in a Lexer
Gavin Lambert
antlr at mirality.co.nz
Fri Mar 20 12:52:12 PDT 2009
At 07:02 21/03/2009, Paul Bouché (NSN) wrote:
>NUMBER
> :
> {noColonInNames || !noColonInNames}?=>
> DIGIT_+
> ;
That predicate is completely redundant.
>NAME
> : {!noColonInNames}?=> (LETTERORDIGIT_ | COLON)+
> | {noColonInNames }?=> LETTERORDIGIT_+ {
> $type = SIMPLENAME; }
> | {noColonInNames }?=> COLON { $type =
> COLON; } { noColonInNames = false;}
> ;
>
>If I then set the noColonInNames to true for the input
>a : b a 3 a3 3a
>I get
>SIMPLENAME COLON NAME NAME NAME NAME NAME
>which is not what I want, I want the 3 to be
>recognized as a NUMBER as it works without
>predicates. Imo there is a bug or I cannot
>understand why it does not work.
Given the input "3", both NAME and NUMBER are
viable output rules since both consume the exact
same input. ANTLR should therefore choose
whichever one is listed first (which ought to be
NUMBER in this case, unless you've listed the
rules out-of-order). If you don't want to be at
the mercy of this sort of thing then you should
modify the NAME rule so that a NAME is not
permitted to begin with a digit.
But given your example input and what you've said
thus far, I think you're overcomplicating
things. If there are no whitespace limitations
between the name fragments and the colon (ie.
whitespace is permitted and ignored), then what
you really should do is to remove the NAME rule
entirely and just have the lexer emit SIMPLENAME,
COLON, and NUMBER. Then up in the parser you can
define a 'name' rule that recognises SIMPLENAME
COLON SIMPLENAME (or whatever) as a single logical unit.
More information about the antlr-interest
mailing list