[antlr-interest] Some bugs (or features?) in Honey Badger

Jan Finis finis at in.tum.de
Tue Feb 21 04:20:54 PST 2012


Works fine now, thanks!

*Another strange thing I encountered:*

The "good" case:

expr
     :   ID op=('=' | '+=' | '-=') expr
     ;

In this case op becomes the Token representing the matched operator, 
which is fine and what the user intended.

The "bad" case:
expr
     :   ID op=('=' | '+=' | expr) expr
     ;

In this (strange) case op is still of type Token. Regardless of the 
matched alternative, it is initialized with _input.LT(1); (which was 
correct in the good case).
If the third alternative (expr) is matched, that does not make any sense.

It is not clear what the user intended in this case, but I think he 
wanted to save the Token/context of the matching alternative into op.
So, op should be an Object (in Java) so that it can hold either an 
Exprcontext or a Token (or contexts and tokens should receive a common 
superclass/interface and op should be of that class/interface). Op should be
initialized with the matched context/token instead of LT(1).
An other alternative would be to completely forbid things like that but 
I think it could be handy sometimes.

Code generated now (btw. the cast is superfluous, _localctx is an 
ExprContext) :

public static class ExprContext extends ParserRuleContext<Token> {
     public Token op;

...

((ExprContext)_localctx).op = _input.LT(1);
switch ( getInterpreter().adaptivePredict(_input,0,_ctx) ) {
     case 1:
         {
         setState(10); match(5);
         }
         break;
     case 2:
         {
         setState(12); match(3);
         }
         break;
     case 3:
         {
         setState(14); expr(3);
         }
         break;
}

How it could look like:

public static class ExprContext extends ParserRuleContext<Token> {
     public TokenOrContext op; //TokenOrContext is an interface 
implemented by Tokens and contexts, for example

...

switch ( getInterpreter().adaptivePredict(_input,0,_ctx) ) {
     case 1:
         {
         setState(10); _localctx.op = match(5);
         }
         break;
     case 2:
         {
         setState(12); _localctx.op = match(3);
         }
         break;
     case 3:
         {
         setState(14); _localctx.op = expr(3);
         }
         break;
}

Regards,
Jan



On 02/20/2012 07:53 PM, Terence Parr wrote:
> Oops.  prefix left-recursive alts weren't recognized with actions on end.
>
> https://github.com/parrt/antlr4/commit/7287f5a2d3719f992f34bfea5071c8d7d9c16ab5
>
> grab parrt/antlr4 again :)
>
> Thanks,
> Ter
> On Feb 20, 2012, at 4:41 AM, Jan Finis wrote:
>
>> On 02/19/2012 10:33 PM, Terence Parr wrote:
>>>
>>> Hi. this suprising me. It translates to:
>>>
>>> expr[int _p]
>>>      :   ( ID '=' expr[3]
>>>          | ID
>>>          )
>>>          ( {1>= $_p}? '+' expr[2]
>>>          )*
>>>      ;
>>>
>>> (See -Xlog option).  Pretty hard for that to match as a=(a+a). are you sure?
>>
>> Hi Ter,
>>
>> I tested it again and was able to confirm the precedence bug, here is 
>> an example grammar producing the bug:
>>
>> grammar TestGrammar;
>>
>> start returns [String result]
>>   : expr {$result = $expr.result; }
>>   ;
>>
>> expr returns [String result]
>>     :   ID '=' e1=expr { $result = "(" + $ID.getText() + "=" + 
>> $e1.result + ")"; }
>>     |   ID { $result = $ID.getText(); }
>>     |   e1=expr '+' e2=expr { $result = "(" + $e1.result + "+" + 
>> $e2.result + ")"; }
>>     ;
>>
>> ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
>>     ;
>>
>> This is the input:
>>
>> a=a+a
>>
>> The output is (a=(a+a)). With correct precedence the output should be 
>> ((a=a)+a).
>>
>> I used the jar from
>>
>> http://antlr.org/download/antlr-4.0ea-complete.jar
>>
>> and redownloaded it to make sure that I do not have an outdated version.
>>
>> The output was produced using this code:
>>
>> TestGrammarLexer lex = new TestGrammarLexer(new ANTLRInputStream(new 
>> FileInputStream(new File("test.input"))));
>> CommonTokenStream tokens = new CommonTokenStream(lex);
>>
>> StartContext i = new TestGrammarParser(tokens).start();
>>
>> System.out.println(i.result);
>>
>>>> the precedence should be from top to bottom, right? So, the input  a=a+a
>>>> should be parsed as (a=a)+a, since the assignment rule is on the top.
>>>> However, this is not the case, instead, it is parsed as a=(a+a). Bug, or
>>>> am I interpreting something wrong?
>>>>
>>>> 2. Name binding
>>>>
>>>> Consider this example:
>>>>
>>>> expr returns [int r]
>>>>      : '-' expr { $r = - $expr.r; }
>>>>
>>>> In this example $expr should bind to the sub-expression in my opinion.
>>>> However, it does not. Since the rule is also named expr, $expr refers to
>>>> the rule context instead of the context of the sub-expression. I think
>>>> most of the time this is not what the user wants.
>>> I think this is consistent with v3. i'll add to list to think about. thanks!
>>> Ter
>>
>> Yes, it is consistent with v3, however v3 didn't have these crazy 
>> left recursive rules :).
>> With these rules, it is much more common to have a non-terminal of 
>> the same type as the rule itself.
>>
>



More information about the antlr-interest mailing list