[antlr-interest] Tail ambiguity

John B. Brodie jbb at acm.org
Fri May 18 16:03:31 PDT 2007


Greetings!

Laurie Harper asked:
>I'm trying to match a sequence of literal text interspersed with 
>expression, where the expression can take either of two forms but can't 
>be mixed. For example, 'a ${b} c ${d}' and 'a #{b} c #{d}' are legal, 
>but 'a ${b} c #{d}' is not.
>...snippedity-snip...
>Here's the relevant rule set:
>
>rValue
>       : (rValueComponent1)+ EOF
>       | (rValueComponent2)+ EOF
>       ;
>
>rValueComponent1
>       : DOLLAR LCURLY expression RCURLY
>       | LiteralExpression
>       ;
>
>rValueComponent2
>       : HASH LCURLY expression RCURLY
>       | LiteralExpression
>       ;

How about this (untested):

rValue : rV_List EOF ;

rV_List
    : ( rValueComponent1 rV1_List )
    | ( rValueComponent2 rV2_List )
    | ( LiteralExpression rV_List? )
    ;

rValueComponent1 : DOLLAR LCURLY expression RCURLY ;
rV1_List : ( rValueComponent1 | LiteralExpression )* ;

rValueComponent2 : HASH LCURLY expression RCURLY ;
rV2_List : ( rValueComponent2 | LiteralExpression )* ;



basically we left-factor your two component rules in order to consume all
leading literals and then differentiate on the first expression occurrence.

   -jbb


More information about the antlr-interest mailing list