[antlr-interest] Some bugs (or features?) in Honey Badger

Mon Feb 20 04:41:57 PST 2012

On 02/19/2012 10:33 PM, Terence Parr wrote:
>
> Hi. this suprising me. It translates to:
>
> expr[int _p]
>      :   ( ID '=' expr[3]
>          | ID
>          )
>          ( {1>= $_p}? '+' expr[2]
>          )*
>      ;
>
> (See -Xlog option).  Pretty hard for that to match as a=(a+a). are you sure?

Hi Ter,

I tested it again and was able to confirm the precedence bug, here is an 
example grammar producing the bug:

grammar TestGrammar;

start returns [String result]
   : expr {$result = $expr.result; }
   ;

expr returns [String result]
     :   ID '=' e1=expr { $result = "(" + $ID.getText() + "=" + 
$e1.result + ")"; }
     |   ID { $result = $ID.getText(); }
     |   e1=expr '+' e2=expr { $result = "(" + $e1.result + "+" + 
$e2.result + ")"; }
     ;

ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
     ;

This is the input:

a=a+a

The output is (a=(a+a)). With correct precedence the output should be 
((a=a)+a).

I used the jar from

http://antlr.org/download/antlr-4.0ea-complete.jar

and redownloaded it to make sure that I do not have an outdated version.

The output was produced using this code:

TestGrammarLexer lex = new TestGrammarLexer(new ANTLRInputStream(new 
FileInputStream(new File("test.input"))));
CommonTokenStream tokens = new CommonTokenStream(lex);

StartContext i = new TestGrammarParser(tokens).start();

System.out.println(i.result);

>
>> the precedence should be from top to bottom, right? So, the input  a=a+a
>> should be parsed as (a=a)+a, since the assignment rule is on the top.
>> However, this is not the case, instead, it is parsed as a=(a+a). Bug, or
>> am I interpreting something wrong?
>>
>> 2. Name binding
>>
>> Consider this example:
>>
>> expr returns [int r]
>>      : '-' expr { $r = - $expr.r; }
>>
>> In this example $expr should bind to the sub-expression in my opinion.
>> However, it does not. Since the rule is also named expr, $expr refers to
>> the rule context instead of the context of the sub-expression. I think
>> most of the time this is not what the user wants.
> I think this is consistent with v3. i'll add to list to think about. thanks!
> Ter

Yes, it is consistent with v3, however v3 didn't have these crazy left 
recursive rules :).
With these rules, it is much more common to have a non-terminal of the 
same type as the rule itself.