[antlr-interest] disambiguating predicates / wrong decision code

Davis, Alan adavis at ti.com
Wed Jul 21 16:00:39 PDT 2010


Ron, thanks, I figured it was something along those lines. It's a bit confusing because I know there are some circumstances where predicates get hoisted into parents' decision code. I guess I need to learn not to rely on the absence of a warning. My semantic predicates are not as simple in reality as in the cutdown I posted so I can't use your suggestion -- but it's OK, I think I can work around this by defining the rules more cleverly. 

(Sorry I didn't see the earlier discussion; I'm a new subscriber and finding the relevant discussions is like drinking from a firehose).

-Alan

-----Original Message-----
From: Ron Hunter-Duvar [mailto:ron.hunter-duvar at oracle.com] 
Sent: Wednesday, July 21, 2010 12:20 PM
To: Davis, Alan
Cc: 'antlr-interest at antlr.org'
Subject: Re: [antlr-interest] disambiguating predicates / wrong decision code

This is an example of a problem I mentioned the other day in a 
discussion on dealing with non-reserved keywords. It's more of an 
inherent limitation in Antlr than a bug. Basically the semantic 
predicates in your vid and id rules are only used when actually matching 
those rules. They don't affect the lookahead analysis in the rules that 
use them. As far as lookahead analysis sees it, your rules are:

vexpr : ID '+' ID;
expr : ID | ID '+' ID;

But the existence of the semantic predicates suppresses the ambiguity 
warning. Antlr tries to generate a reasonable lookahead, but it cases 
like this it's usually wrong.

The simplest way to fix your particular case would be to create two 
different token types VID and ID:

VID  :   ('v'|'V') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
ID  :   ('a'..'u'|'w'..'z'|'A'..'U'|'U'..'Z'|'_') 
('a'..'z'|'A'..'Z'|'0'..'9'|'_')*

Then there's no ambiguity.

Ron


Davis, Alan wrote:
> I'm trying to use a disambiguating semantic predicate to differentiate expressions involving different-class identifiers. For example "A1+A2" is an "expr" while "V1+V2" is a "vexpr".  The problem is ANTLR seems to incorrectly predict a valid sentence, resulting in a syntax error, even though there is no warning about an ambiguity or anything.
>
> Here is the grammar:
>
> -----------------------------
> grammar vexpr;
> options { language = C; }
>
> program : stat+ ;
>
> stat  : vid '=' rhs ';'
>       ;
>
> rhs : vid             // handles V2
>     | vexpr           // handles V2 + V3
>     | expr            // handles(?) A1 + A2 
>     ;
>
> vexpr : vid '+' vid
>       ;
>
> expr: id
>     | id '+' id
>     ;
>
> // IDs that start with 'V'
> vid : { *(LT(1)->getText(LT(1))->chars) == 'V' }? ID 
>     ;   
>
> // IDs that don't start with 'V'
> id  : { *(LT(1)->getText(LT(1))->chars) != 'V' }? ID 
>     ; 
>
> ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* 
>     ;
>
> WS  :   (' ' | '\t' | '\r' | '\n')+ { $channel=HIDDEN; } 
>     ;  
>
> ------------------
> (The asymmetry in handling atoms is an artifact of the cutdown; if I move vid from rhs to vexpr this problem goes away).
>
> The parser cannot parse the following input: V1 = A1 + A2 ;
>
> When I look at the generated code for rhs (below), it seems to predict alternative 2 (vexpr) if LA(2) is '+' (8).  But '+' in that position is not sufficient to disambiguate alternative 2 from alternative 3!
>
> This seems like a bug -- or am I missing something.
>
> Here is the decision code (reformatted for brevity):
>
> switch ( LA(1) )
> {        
>    case ID:
>    {         
>       if ( (LA(2) == 8) )   // '+'
>          alt2=2;      // vexpr
>
>       else if ( (( *(LT(1)->getText(LT(1))->chars) == 'V' )) )
>          alt2=1;      // vid 
>
>       else if ( (( *(LT(1)->getText(LT(1))->chars) != 'V' )) )
>          alt2=3;      // expr
>
>       else               
>       {                  
>          CONSTRUCTEX();
>          ...      
>
> - Alan Davis
>   Texas Instruments
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>   

-- 
Ron Hunter-Duvar | Software Developer V | 403-272-6580
Oracle Service Engineering
Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5

All opinions expressed here are mine, and do not necessarily represent
those of my employer.



More information about the antlr-interest mailing list