[antlr-interest] Some bugs (or features?) in Honey Badger

Terence Parr parrt at cs.usfca.edu
Wed Feb 22 13:40:53 PST 2012

hi. that's basically a bug. :) It should not allow rules that are simple sets of tokens.
On Feb 21, 2012, at 4:20 AM, Jan Finis wrote:

> Works fine now, thanks!
> *Another strange thing I encountered:*
> The "good" case:
> expr
>     :   ID op=('=' | '+=' | '-=') expr
>     ;
> In this case op becomes the Token representing the matched operator, 
> which is fine and what the user intended.
> The "bad" case:
> expr
>     :   ID op=('=' | '+=' | expr) expr
>     ;
> In this (strange) case op is still of type Token. Regardless of the 
> matched alternative, it is initialized with _input.LT(1); (which was 
> correct in the good case).
> If the third alternative (expr) is matched, that does not make any sense.
> It is not clear what the user intended in this case, but I think he 
> wanted to save the Token/context of the matching alternative into op.
> So, op should be an Object (in Java) so that it can hold either an 
> Exprcontext or a Token (or contexts and tokens should receive a common 
> superclass/interface and op should be of that class/interface). Op should be
> initialized with the matched context/token instead of LT(1).
> An other alternative would be to completely forbid things like that but 
> I think it could be handy sometimes.
> Code generated now (btw. the cast is superfluous, _localctx is an 
> ExprContext) :
> public static class ExprContext extends ParserRuleContext<Token> {
>     public Token op;
> ...
> ((ExprContext)_localctx).op = _input.LT(1);
> switch ( getInterpreter().adaptivePredict(_input,0,_ctx) ) {
>     case 1:
>         {
>         setState(10); match(5);
>         }
>         break;
>     case 2:
>         {
>         setState(12); match(3);
>         }
>         break;
>     case 3:
>         {
>         setState(14); expr(3);
>         }
>         break;
> }
> How it could look like:
> public static class ExprContext extends ParserRuleContext<Token> {
>     public TokenOrContext op; //TokenOrContext is an interface 
> implemented by Tokens and contexts, for example
> ...
> switch ( getInterpreter().adaptivePredict(_input,0,_ctx) ) {
>     case 1:
>         {
>         setState(10); _localctx.op = match(5);
>         }
>         break;
>     case 2:
>         {
>         setState(12); _localctx.op = match(3);
>         }
>         break;
>     case 3:
>         {
>         setState(14); _localctx.op = expr(3);
>         }
>         break;
> }
> Regards,
> Jan
> On 02/20/2012 07:53 PM, Terence Parr wrote:
>> Oops.  prefix left-recursive alts weren't recognized with actions on end.
>> https://github.com/parrt/antlr4/commit/7287f5a2d3719f992f34bfea5071c8d7d9c16ab5
>> grab parrt/antlr4 again :)
>> Thanks,
>> Ter
>> On Feb 20, 2012, at 4:41 AM, Jan Finis wrote:
>>> On 02/19/2012 10:33 PM, Terence Parr wrote:
>>>> Hi. this suprising me. It translates to:
>>>> expr[int _p]
>>>>     :   ( ID '=' expr[3]
>>>>         | ID
>>>>         )
>>>>         ( {1>= $_p}? '+' expr[2]
>>>>         )*
>>>>     ;
>>>> (See -Xlog option).  Pretty hard for that to match as a=(a+a). are you sure?
>>> Hi Ter,
>>> I tested it again and was able to confirm the precedence bug, here is 
>>> an example grammar producing the bug:
>>> grammar TestGrammar;
>>> start returns [String result]
>>>  : expr {$result = $expr.result; }
>>>  ;
>>> expr returns [String result]
>>>    :   ID '=' e1=expr { $result = "(" + $ID.getText() + "=" + 
>>> $e1.result + ")"; }
>>>    |   ID { $result = $ID.getText(); }
>>>    |   e1=expr '+' e2=expr { $result = "(" + $e1.result + "+" + 
>>> $e2.result + ")"; }
>>>    ;
>>> ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
>>>    ;
>>> This is the input:
>>> a=a+a
>>> The output is (a=(a+a)). With correct precedence the output should be 
>>> ((a=a)+a).
>>> I used the jar from
>>> http://antlr.org/download/antlr-4.0ea-complete.jar
>>> and redownloaded it to make sure that I do not have an outdated version.
>>> The output was produced using this code:
>>> TestGrammarLexer lex = new TestGrammarLexer(new ANTLRInputStream(new 
>>> FileInputStream(new File("test.input"))));
>>> CommonTokenStream tokens = new CommonTokenStream(lex);
>>> StartContext i = new TestGrammarParser(tokens).start();
>>> System.out.println(i.result);
>>>>> the precedence should be from top to bottom, right? So, the input  a=a+a
>>>>> should be parsed as (a=a)+a, since the assignment rule is on the top.
>>>>> However, this is not the case, instead, it is parsed as a=(a+a). Bug, or
>>>>> am I interpreting something wrong?
>>>>> 2. Name binding
>>>>> Consider this example:
>>>>> expr returns [int r]
>>>>>     : '-' expr { $r = - $expr.r; }
>>>>> In this example $expr should bind to the sub-expression in my opinion.
>>>>> However, it does not. Since the rule is also named expr, $expr refers to
>>>>> the rule context instead of the context of the sub-expression. I think
>>>>> most of the time this is not what the user wants.
>>>> I think this is consistent with v3. i'll add to list to think about. thanks!
>>>> Ter
>>> Yes, it is consistent with v3, however v3 didn't have these crazy 
>>> left recursive rules :).
>>> With these rules, it is much more common to have a non-terminal of 
>>> the same type as the rule itself.
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

More information about the antlr-interest mailing list