[antlr-interest] Understanding (and tweaking) DFAs

Edson Tirelli ed.tirelli at gmail.com
Fri Jan 21 18:32:39 PST 2011


   Hi all,

   In my long quest to understand the reason behind some of antlr's
decisions, I am stuck with the following problem.

   I am using this lexer:

https://github.com/etirelli/droolsjbpm/blob/JBRULES-2642/drools-compiler/src/main/resources/org/drools/lang/DRLLexer.g

   And this grammar:

https://github.com/etirelli/droolsjbpm/blob/JBRULES-2642/drools-compiler/src/main/resources/org/drools/lang/DRLExpressions.g

   The grammar is just an expression grammar extracted from the java grammar
with some tweaks.

   Now the problem: imagine I have a long input stream, and at a given point
in time, my stream looks like this:

input := "foo" COMMA ...

   Where "foo" is a string token, and COMMA is the token for ",".

   At this point I invoke the conditionalExpression rule, that correctly
parses "foo" as an expression (a simple expression that is just a string),
and nicely stops parsing, as the next token (COMMA) is not part of the
expression. It works fine as expected, and some other rule will consume the
COMMA.

   Now, if my input stream is (where AT is the token for "@"):

input := "foo" AT ...

   Instead of doing exactly what it did before, it raises a
NoViableAltException at the "@", what is not what I want. I tested with
several different tokens on the input stream, like "!", or ";", or "ID",
etc... they all work as expected, but if an @ shows up, it blows.

   Now, debugging the generated code I found the culprit on this rule:

unaryExpressionNotPlusMinus
    :   TILDE unaryExpression
    | NEGATION unaryExpression
    |   (castExpression)=>castExpression
    |   primary ((selector)=>selector)* ((INCR|DECR)=> (INCR|DECR))?
    ;

   The ((selector)=>selector)* part generates the DFA29 (antlrworks
generated graph attached) where you can see the top branch containing the
relevant tokens, but explicitly excluding @. From the token file:

...
NULL=19
AT=20
PLUS_ASSIGN=21
...

    From the generated code for the DFA29:

...
if( ...(LA29_0>=BOOL && LA29_0<=NULL)||(LA29_0>=PLUS_ASSIGN &&
LA29_0<=INCR)... )
   s = 1;

    I tried blindly tweaking rules to fix it or at least understand why
ANTLR is not including AT on this branch of the DFA, but had no success.

    So my question is: is it possible to manually tweak the DFA to achive
the expected result? (other than obviously changing the generated code
manually, as it would make long term maintenance a hell)

    Or maybe is there another way to work around this problem?

    Sorry for the long e-mail and thanks in advance.

     Edson

-- 
  Edson Tirelli
  JBoss Drools Core Development
  JBoss by Red Hat @ www.jboss.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DFA29.png
Type: image/png
Size: 38268 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20110121/02d73a94/attachment.png 


More information about the antlr-interest mailing list