[antlr-interest] Some bugs (or features?) in Honey Badger
Terence Parr
parrt at cs.usfca.edu
Wed Feb 22 13:40:53 PST 2012
hi. that's basically a bug. :) It should not allow rules that are simple sets of tokens.
Ter
On Feb 21, 2012, at 4:20 AM, Jan Finis wrote:
> Works fine now, thanks!
>
> *Another strange thing I encountered:*
>
> The "good" case:
>
> expr
> : ID op=('=' | '+=' | '-=') expr
> ;
>
> In this case op becomes the Token representing the matched operator,
> which is fine and what the user intended.
>
> The "bad" case:
> expr
> : ID op=('=' | '+=' | expr) expr
> ;
>
> In this (strange) case op is still of type Token. Regardless of the
> matched alternative, it is initialized with _input.LT(1); (which was
> correct in the good case).
> If the third alternative (expr) is matched, that does not make any sense.
>
> It is not clear what the user intended in this case, but I think he
> wanted to save the Token/context of the matching alternative into op.
> So, op should be an Object (in Java) so that it can hold either an
> Exprcontext or a Token (or contexts and tokens should receive a common
> superclass/interface and op should be of that class/interface). Op should be
> initialized with the matched context/token instead of LT(1).
> An other alternative would be to completely forbid things like that but
> I think it could be handy sometimes.
>
> Code generated now (btw. the cast is superfluous, _localctx is an
> ExprContext) :
>
> public static class ExprContext extends ParserRuleContext<Token> {
> public Token op;
>
> ...
>
> ((ExprContext)_localctx).op = _input.LT(1);
> switch ( getInterpreter().adaptivePredict(_input,0,_ctx) ) {
> case 1:
> {
> setState(10); match(5);
> }
> break;
> case 2:
> {
> setState(12); match(3);
> }
> break;
> case 3:
> {
> setState(14); expr(3);
> }
> break;
> }
>
> How it could look like:
>
> public static class ExprContext extends ParserRuleContext<Token> {
> public TokenOrContext op; //TokenOrContext is an interface
> implemented by Tokens and contexts, for example
>
> ...
>
> switch ( getInterpreter().adaptivePredict(_input,0,_ctx) ) {
> case 1:
> {
> setState(10); _localctx.op = match(5);
> }
> break;
> case 2:
> {
> setState(12); _localctx.op = match(3);
> }
> break;
> case 3:
> {
> setState(14); _localctx.op = expr(3);
> }
> break;
> }
>
> Regards,
> Jan
>
>
>
> On 02/20/2012 07:53 PM, Terence Parr wrote:
>> Oops. prefix left-recursive alts weren't recognized with actions on end.
>>
>> https://github.com/parrt/antlr4/commit/7287f5a2d3719f992f34bfea5071c8d7d9c16ab5
>>
>> grab parrt/antlr4 again :)
>>
>> Thanks,
>> Ter
>> On Feb 20, 2012, at 4:41 AM, Jan Finis wrote:
>>
>>> On 02/19/2012 10:33 PM, Terence Parr wrote:
>>>>
>>>> Hi. this suprising me. It translates to:
>>>>
>>>> expr[int _p]
>>>> : ( ID '=' expr[3]
>>>> | ID
>>>> )
>>>> ( {1>= $_p}? '+' expr[2]
>>>> )*
>>>> ;
>>>>
>>>> (See -Xlog option). Pretty hard for that to match as a=(a+a). are you sure?
>>>
>>> Hi Ter,
>>>
>>> I tested it again and was able to confirm the precedence bug, here is
>>> an example grammar producing the bug:
>>>
>>> grammar TestGrammar;
>>>
>>> start returns [String result]
>>> : expr {$result = $expr.result; }
>>> ;
>>>
>>> expr returns [String result]
>>> : ID '=' e1=expr { $result = "(" + $ID.getText() + "=" +
>>> $e1.result + ")"; }
>>> | ID { $result = $ID.getText(); }
>>> | e1=expr '+' e2=expr { $result = "(" + $e1.result + "+" +
>>> $e2.result + ")"; }
>>> ;
>>>
>>> ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
>>> ;
>>>
>>> This is the input:
>>>
>>> a=a+a
>>>
>>> The output is (a=(a+a)). With correct precedence the output should be
>>> ((a=a)+a).
>>>
>>> I used the jar from
>>>
>>> http://antlr.org/download/antlr-4.0ea-complete.jar
>>>
>>> and redownloaded it to make sure that I do not have an outdated version.
>>>
>>> The output was produced using this code:
>>>
>>> TestGrammarLexer lex = new TestGrammarLexer(new ANTLRInputStream(new
>>> FileInputStream(new File("test.input"))));
>>> CommonTokenStream tokens = new CommonTokenStream(lex);
>>>
>>> StartContext i = new TestGrammarParser(tokens).start();
>>>
>>> System.out.println(i.result);
>>>
>>>>> the precedence should be from top to bottom, right? So, the input a=a+a
>>>>> should be parsed as (a=a)+a, since the assignment rule is on the top.
>>>>> However, this is not the case, instead, it is parsed as a=(a+a). Bug, or
>>>>> am I interpreting something wrong?
>>>>>
>>>>> 2. Name binding
>>>>>
>>>>> Consider this example:
>>>>>
>>>>> expr returns [int r]
>>>>> : '-' expr { $r = - $expr.r; }
>>>>>
>>>>> In this example $expr should bind to the sub-expression in my opinion.
>>>>> However, it does not. Since the rule is also named expr, $expr refers to
>>>>> the rule context instead of the context of the sub-expression. I think
>>>>> most of the time this is not what the user wants.
>>>> I think this is consistent with v3. i'll add to list to think about. thanks!
>>>> Ter
>>>
>>> Yes, it is consistent with v3, however v3 didn't have these crazy
>>> left recursive rules :).
>>> With these rules, it is much more common to have a non-terminal of
>>> the same type as the rule itself.
>>>
>>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list