[antlr-interest] Understanding (and tweaking) DFAs

Edson Tirelli ed.tirelli at gmail.com
Fri Jan 21 19:05:09 PST 2011


   Just to follow up on this, I found one way to work around this problem: I
added AT to the rule:

assignmentOperator
:   EQUALS_ASSIGN
        |   PLUS_ASSIGN
        |   MINUS_ASSIGN
        |   MULT_ASSIGN
        |   DIV_ASSIGN
        |   AND_ASSIGN
        |   OR_ASSIGN
        |   XOR_ASSIGN
        |   MOD_ASSIGN
        |   LESS LESS EQUALS_ASSIGN
        |   (GREATER GREATER GREATER)=> GREATER GREATER GREATER
EQUALS_ASSIGN
        |   (GREATER GREATER)=> GREATER GREATER EQUALS_ASSIGN
        |   AT
;

    Since I am never invoking this rule with my current work, for now, I
don't have a problem in leaving the AT there just to work around the
problem. But this is exactly what is bothering me: why is ANTLR failing like
that, and by changing a rule that is not even being called in the rule chain
(conditionalExpression), it changes the behavior and "fix the problem"?!!!

    BTW, this is ANTLR 3.3.

    Thanks,
      Edson



2011/1/21 Edson Tirelli <ed.tirelli at gmail.com>

>
>    Hi all,
>
>    In my long quest to understand the reason behind some of antlr's
> decisions, I am stuck with the following problem.
>
>    I am using this lexer:
>
>
> https://github.com/etirelli/droolsjbpm/blob/JBRULES-2642/drools-compiler/src/main/resources/org/drools/lang/DRLLexer.g
>
>    And this grammar:
>
>
> https://github.com/etirelli/droolsjbpm/blob/JBRULES-2642/drools-compiler/src/main/resources/org/drools/lang/DRLExpressions.g
>
>    The grammar is just an expression grammar extracted from the java
> grammar with some tweaks.
>
>    Now the problem: imagine I have a long input stream, and at a given
> point in time, my stream looks like this:
>
> input := "foo" COMMA ...
>
>    Where "foo" is a string token, and COMMA is the token for ",".
>
>    At this point I invoke the conditionalExpression rule, that correctly
> parses "foo" as an expression (a simple expression that is just a string),
> and nicely stops parsing, as the next token (COMMA) is not part of the
> expression. It works fine as expected, and some other rule will consume the
> COMMA.
>
>    Now, if my input stream is (where AT is the token for "@"):
>
> input := "foo" AT ...
>
>    Instead of doing exactly what it did before, it raises a
> NoViableAltException at the "@", what is not what I want. I tested with
> several different tokens on the input stream, like "!", or ";", or "ID",
> etc... they all work as expected, but if an @ shows up, it blows.
>
>    Now, debugging the generated code I found the culprit on this rule:
>
> unaryExpressionNotPlusMinus
>     :   TILDE unaryExpression
>     | NEGATION unaryExpression
>     |   (castExpression)=>castExpression
>     |   primary ((selector)=>selector)* ((INCR|DECR)=> (INCR|DECR))?
>     ;
>
>    The ((selector)=>selector)* part generates the DFA29 (antlrworks
> generated graph attached) where you can see the top branch containing the
> relevant tokens, but explicitly excluding @. From the token file:
>
> ...
> NULL=19
> AT=20
> PLUS_ASSIGN=21
> ...
>
>     From the generated code for the DFA29:
>
> ...
> if( ...(LA29_0>=BOOL && LA29_0<=NULL)||(LA29_0>=PLUS_ASSIGN &&
> LA29_0<=INCR)... )
>    s = 1;
>
>     I tried blindly tweaking rules to fix it or at least understand why
> ANTLR is not including AT on this branch of the DFA, but had no success.
>
>     So my question is: is it possible to manually tweak the DFA to achive
> the expected result? (other than obviously changing the generated code
> manually, as it would make long term maintenance a hell)
>
>     Or maybe is there another way to work around this problem?
>
>     Sorry for the long e-mail and thanks in advance.
>
>      Edson
>
> --
>   Edson Tirelli
>   JBoss Drools Core Development
>   JBoss by Red Hat @ www.jboss.com
>



-- 
  Edson Tirelli
  JBoss Drools Core Development
  JBoss by Red Hat @ www.jboss.com


More information about the antlr-interest mailing list