[antlr-interest] Understanding (and tweaking) DFAs
Edson Tirelli
ed.tirelli at gmail.com
Fri Jan 21 19:05:09 PST 2011
Just to follow up on this, I found one way to work around this problem: I
added AT to the rule:
assignmentOperator
: EQUALS_ASSIGN
| PLUS_ASSIGN
| MINUS_ASSIGN
| MULT_ASSIGN
| DIV_ASSIGN
| AND_ASSIGN
| OR_ASSIGN
| XOR_ASSIGN
| MOD_ASSIGN
| LESS LESS EQUALS_ASSIGN
| (GREATER GREATER GREATER)=> GREATER GREATER GREATER
EQUALS_ASSIGN
| (GREATER GREATER)=> GREATER GREATER EQUALS_ASSIGN
| AT
;
Since I am never invoking this rule with my current work, for now, I
don't have a problem in leaving the AT there just to work around the
problem. But this is exactly what is bothering me: why is ANTLR failing like
that, and by changing a rule that is not even being called in the rule chain
(conditionalExpression), it changes the behavior and "fix the problem"?!!!
BTW, this is ANTLR 3.3.
Thanks,
Edson
2011/1/21 Edson Tirelli <ed.tirelli at gmail.com>
>
> Hi all,
>
> In my long quest to understand the reason behind some of antlr's
> decisions, I am stuck with the following problem.
>
> I am using this lexer:
>
>
> https://github.com/etirelli/droolsjbpm/blob/JBRULES-2642/drools-compiler/src/main/resources/org/drools/lang/DRLLexer.g
>
> And this grammar:
>
>
> https://github.com/etirelli/droolsjbpm/blob/JBRULES-2642/drools-compiler/src/main/resources/org/drools/lang/DRLExpressions.g
>
> The grammar is just an expression grammar extracted from the java
> grammar with some tweaks.
>
> Now the problem: imagine I have a long input stream, and at a given
> point in time, my stream looks like this:
>
> input := "foo" COMMA ...
>
> Where "foo" is a string token, and COMMA is the token for ",".
>
> At this point I invoke the conditionalExpression rule, that correctly
> parses "foo" as an expression (a simple expression that is just a string),
> and nicely stops parsing, as the next token (COMMA) is not part of the
> expression. It works fine as expected, and some other rule will consume the
> COMMA.
>
> Now, if my input stream is (where AT is the token for "@"):
>
> input := "foo" AT ...
>
> Instead of doing exactly what it did before, it raises a
> NoViableAltException at the "@", what is not what I want. I tested with
> several different tokens on the input stream, like "!", or ";", or "ID",
> etc... they all work as expected, but if an @ shows up, it blows.
>
> Now, debugging the generated code I found the culprit on this rule:
>
> unaryExpressionNotPlusMinus
> : TILDE unaryExpression
> | NEGATION unaryExpression
> | (castExpression)=>castExpression
> | primary ((selector)=>selector)* ((INCR|DECR)=> (INCR|DECR))?
> ;
>
> The ((selector)=>selector)* part generates the DFA29 (antlrworks
> generated graph attached) where you can see the top branch containing the
> relevant tokens, but explicitly excluding @. From the token file:
>
> ...
> NULL=19
> AT=20
> PLUS_ASSIGN=21
> ...
>
> From the generated code for the DFA29:
>
> ...
> if( ...(LA29_0>=BOOL && LA29_0<=NULL)||(LA29_0>=PLUS_ASSIGN &&
> LA29_0<=INCR)... )
> s = 1;
>
> I tried blindly tweaking rules to fix it or at least understand why
> ANTLR is not including AT on this branch of the DFA, but had no success.
>
> So my question is: is it possible to manually tweak the DFA to achive
> the expected result? (other than obviously changing the generated code
> manually, as it would make long term maintenance a hell)
>
> Or maybe is there another way to work around this problem?
>
> Sorry for the long e-mail and thanks in advance.
>
> Edson
>
> --
> Edson Tirelli
> JBoss Drools Core Development
> JBoss by Red Hat @ www.jboss.com
>
--
Edson Tirelli
JBoss Drools Core Development
JBoss by Red Hat @ www.jboss.com
More information about the antlr-interest
mailing list