[antlr-interest] Some bugs (or features?) in Honey Badger

Mon Feb 20 10:53:06 PST 2012

Oops.  prefix left-recursive alts weren't recognized with actions on end.

https://github.com/parrt/antlr4/commit/7287f5a2d3719f992f34bfea5071c8d7d9c16ab5

grab parrt/antlr4 again :)

Thanks,
Ter
On Feb 20, 2012, at 4:41 AM, Jan Finis wrote:

> On 02/19/2012 10:33 PM, Terence Parr wrote:
>> 
>> 
>> Hi. this suprising me. It translates to:
>> 
>> expr[int _p]
>>     :   ( ID '=' expr[3] 
>>         | ID 
>>         )
>>         ( {1 >= $_p}? '+' expr[2]
>>         )*
>>     ;
>> 
>> (See -Xlog option).  Pretty hard for that to match as a=(a+a). are you sure?
> 
> Hi Ter,
> 
> I tested it again and was able to confirm the precedence bug, here is an example grammar producing the bug:
> 
> grammar TestGrammar;
> 
> start returns [String result]
>   : expr {$result = $expr.result; }
>   ;
> 
> expr returns [String result]
>     :   ID '=' e1=expr { $result = "(" + $ID.getText() + "=" + $e1.result + ")"; }
>     |   ID { $result = $ID.getText(); }
>     |   e1=expr '+' e2=expr { $result = "(" + $e1.result + "+" + $e2.result + ")"; }
>     ;
> 
> ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
>     ;
> 
> This is the input:
> 
> a=a+a
> 
> The output is (a=(a+a)). With correct precedence the output should be ((a=a)+a).
> 
> I used the jar from 
> 
> http://antlr.org/download/antlr-4.0ea-complete.jar
> 
> and redownloaded it to make sure that I do not have an outdated version.
> 
> The output was produced using this code:
> 
> TestGrammarLexer lex = new TestGrammarLexer(new ANTLRInputStream(new FileInputStream(new File("test.input"))));
> CommonTokenStream tokens = new CommonTokenStream(lex);
>            
> StartContext i = new TestGrammarParser(tokens).start();
>         
> System.out.println(i.result);
> 
>> 
>>> the precedence should be from top to bottom, right? So, the input  a=a+a 
>>> should be parsed as (a=a)+a, since the assignment rule is on the top. 
>>> However, this is not the case, instead, it is parsed as a=(a+a). Bug, or 
>>> am I interpreting something wrong?
>>> 
>>> 2. Name binding
>>> 
>>> Consider this example:
>>> 
>>> expr returns [int r]
>>>     : '-' expr { $r = - $expr.r; }
>>> 
>>> In this example $expr should bind to the sub-expression in my opinion. 
>>> However, it does not. Since the rule is also named expr, $expr refers to 
>>> the rule context instead of the context of the sub-expression. I think 
>>> most of the time this is not what the user wants.
>> I think this is consistent with v3. i'll add to list to think about. thanks!
>> Ter
> 
> Yes, it is consistent with v3, however v3 didn't have these crazy left recursive rules :). 
> With these rules, it is much more common to have a non-terminal of the same type as the rule itself.
>