[antlr-interest] Issue while building objects at the parsing stage.

Atul Dambalkar atul at entrib.com
Mon Aug 6 04:43:44 PDT 2012


Hi,

Please take a look at following grammar/parser. The grammar is simply for
parsing the expressions (logical, relational and arithmetic). So it has the
same rules related to operator precedence as well as the non-LR parsing. In
the parser I have added Java code (which should be fairly self explanatory)
which constructs the nested expression object once the entire expression is
parsed. I am also using tree creation constructs in the grammar but that
can be ignored.

Expression like this is getting parsed successfully - tag2 >= (435 * ----12
+ (12 ---23)) && tag1 starts "200" || tag3 starts "200"
The above expression after being parsed the tree gets printed as - (|| (&&
(>= tag2 (+ (* 435 - - - - 12) (- 12 - - 23))) (starts tag1 "200")) (starts
tag3 "200"))

But the nested data structure I try to build skips one complete sub-tree
which roots at && node. But the same (&&) node gets added if I change the
above expression to -  (tag2 >= (435 * ----12 + (12 ---23)) && tag1 starts
"200") || tag3 starts "200". Please note I just added a left paren and
right paren to the && node.

After little bit of looking into the generated code, I realized that the &&
expression node object gets built correctly but it gets overwritten by one
of the node objects for "starts" in the expression later. And I believe the
reason it gets over-written is due the "do-while" code in the generated
code which is kind-of iterating while parsing the expression (I guess due
to non-LR parsing). Only if I put a the left paren and right paren, the
expression parsing takes a different route as per the grammar and the &&
node gets constructed through different route and hence gets added to the
nested expression object.

Can someone help me here in fixing specifically the part where recursive
logical expression is getting built?

Thanks and appreciate the help.

================================
// Author: Atul Dambalkar (atul at entrib.com)

parser grammar TParser;

options {

    // Default language but name it anyway
    language  = Java;

    // output as AST
    output = AST;

    // Use a superclass to implement all helper methods, instance variables
and overrides
    // of ANTLR default methods, such as error handling.
    //
    superClass = AbstractTParser;

    // Use the vocabulary generated by the accompanying lexer. Maven knows
how to work out the relationship
    // between the lexer and parser and will build the lexer before the
parser. It will also rebuild the
    // parser if the lexer changes.
    tokenVocab = TLexer;
}

// Some imaginary tokens for tree rewrites
tokens {
    UNARYNOT;
    UNARYMINUS;
    UNARYEXISTS;
}

// What package should the generated source exist in?
@header {
    package com.entrib.poc.antlr;
    import com.entrib.poc.antlr.converter.type.*;
    import com.entrib.poc.antlr.converter.expr.*;
}

prog returns [Expression expression]
    : expr { $expression = $expr.expression; } EOF
    ;

expr returns [Expression expression]
    : relationalExpression1 { $expression =
$relationalExpression1.expression; }
    (
        {
            $expression = new LogicalExpression();

((LogicalExpression)$expression).setExpression1($relationalExpression1.expression);
        }
        (
            AND^ {
((LogicalExpression)$expression).setLogicalOperatorEnum(LogicalExpression.LogicalOperatorEnum.AND);
}
            |
            OR^ {
((LogicalExpression)$expression).setLogicalOperatorEnum(LogicalExpression.LogicalOperatorEnum.OR);
}
        )
        relationalExpression2 {
((LogicalExpression)$expression).setExpression2($relationalExpression2.expression);
}
    )*
    ;

relationalExpression1 returns [Expression expression]
    : relationalExpression { $expression =
$relationalExpression.expression; }
    ;

relationalExpression2 returns [Expression expression]
    : relationalExpression { $expression =
$relationalExpression.expression; }
    ;

valueList
    : LB!
        (primitiveValue)
        (COMMA primitiveValue)*
      RB!
    ;

primitiveValue returns [Expression expression]
    : INT { $expression = createValueObject(ValueType.INT, $INT.text); }
    |
    FLOAT { $expression = createValueObject(ValueType.FLOAT, $FLOAT.text); }
    |
    TIMESTAMP { $expression = createValueObject(ValueType.DATE,
$TIMESTAMP.text); }
    |
    STRING { $expression = createValueObject(ValueType.STRING,
$STRING.text); }
    |
    ID { $expression = createValueObject(ValueType.ID, $ID.text); }
    |
    TAG { $expression = createValueObject(ValueType.TAG, $TAG.text); }
    |
    LP! expr { $expression = $expr.expression; } RP!
    ;

expr1
    : expr
    ;

expr2
    : expr
    ;

primitiveElement returns [Expression expression]
     : primitiveValue { $expression = $primitiveValue.expression; }
     |
     valueList
     ;

existsExpression returns [Expression expression]
     : EXISTS TAG { $expression = createExistsExpression($TAG.text); }
     ;

relationalExpression returns [Expression expression]
     : arithExpression1 { $expression = $arithExpression1.expression; }
     (
         {
             $expression = new RelationalExpression();

 ((RelationalExpression)$expression).setExpression1($arithExpression1.expression);
         }
         (
             EQ^ {
((RelationalExpression)$expression).setRelationalOperatorEnum(RelationalExpression.RelationalOperatorEnum.EQ);
}
             |
             NEQ^ {
((RelationalExpression)$expression).setRelationalOperatorEnum(RelationalExpression.RelationalOperatorEnum.NEQ);
}
             |
             GT^ {
((RelationalExpression)$expression).setRelationalOperatorEnum(
RelationalExpression.RelationalOperatorEnum.GT); }
             |
             GTE^ {
((RelationalExpression)$expression).setRelationalOperatorEnum(RelationalExpression.RelationalOperatorEnum.GTE);
}
             |
             LT^ {
((RelationalExpression)$expression).setRelationalOperatorEnum(
RelationalExpression.RelationalOperatorEnum.LT); }
             |
             LTE^ {
((RelationalExpression)$expression).setRelationalOperatorEnum(RelationalExpression.RelationalOperatorEnum.LTE);
}
             |
             IN^ {
((RelationalExpression)$expression).setRelationalOperatorEnum(
RelationalExpression.RelationalOperatorEnum.IN); }
             |
             NIN^ {
((RelationalExpression)$expression).setRelationalOperatorEnum(RelationalExpression.RelationalOperatorEnum.NIN);
}
             |
             CONTAINS^ {
((RelationalExpression)$expression).setRelationalOperatorEnum(RelationalExpression.RelationalOperatorEnum.CONTAINS);
}
             |
             STARTS^ {
((RelationalExpression)$expression).setRelationalOperatorEnum(RelationalExpression.RelationalOperatorEnum.STARTS);
}
             |
             ENDS^ {
((RelationalExpression)$expression).setRelationalOperatorEnum(RelationalExpression.RelationalOperatorEnum.ENDS);
}
             |
             ALL^ {
((RelationalExpression)$expression).setRelationalOperatorEnum(RelationalExpression.RelationalOperatorEnum.ALL);
}
             |
             NOR^ {
((RelationalExpression)$expression).setRelationalOperatorEnum(RelationalExpression.RelationalOperatorEnum.NOR);
}
         )
         arithExpression2 {
((RelationalExpression)$expression).setExpression2($arithExpression2.expression);
}
     )*
     ;

arithExpression1 returns [Expression expression]
    : arithExpression { $expression = $arithExpression.expression; }
    ;

arithExpression2 returns [Expression expression]
    : arithExpression { $expression = $arithExpression.expression; }
    ;

arithExpression returns [Expression expression]
     : multiplyingExpression1 { $expression =
$multiplyingExpression1.expression; }
     (
         {
             $expression = new ArithExpression();

 ((ArithExpression)$expression).setExpression1($multiplyingExpression1.expression);
         }
         (
             PLUS^ {
((ArithExpression)$expression).setArithOperatorEnum(ArithExpression.ArithOperatorEnum.PLUS);
}
             |
             MINUS^ {
((ArithExpression)$expression).setArithOperatorEnum(ArithExpression.ArithOperatorEnum.MINUS);
}
         )
         multiplyingExpression2 {
((ArithExpression)$expression).setExpression2($multiplyingExpression2.expression);
}
     )*
     ;

multiplyingExpression1 returns [Expression expression]
    : multiplyingExpression { $expression =
$multiplyingExpression.expression; }
    ;

multiplyingExpression2 returns [Expression expression]
    : multiplyingExpression { $expression =
$multiplyingExpression.expression; }
    ;

signExpression returns [Expression expression]
     : { $expression = new MinusExpression(); }
     MINUS signExpression1 {
((MinusExpression)$expression).setExpression($signExpression1.expression); }
     | existsExpression { $expression = $existsExpression.expression; }
     | primitiveElement { $expression = $primitiveElement.expression; }
     ;

multiplyingExpression returns [Expression expression]
     : signExpression1 { $expression = $signExpression1.expression; }
     (
         {
             $expression = new MultiplyingExpression();

 ((MultiplyingExpression)$expression).setExpression1($signExpression1.expression);
         }
         (
             MULTI^ {
((MultiplyingExpression)$expression).setMultiplyingOperatorEnum(MultiplyingExpression.MultiplyingOperatorEnum.MULTI);
}
             |
             DIV^ {
((MultiplyingExpression)$expression).setMultiplyingOperatorEnum(MultiplyingExpression.MultiplyingOperatorEnum.DIV);
}
             |
             MOD^ {
((MultiplyingExpression)$expression).setMultiplyingOperatorEnum(MultiplyingExpression.MultiplyingOperatorEnum.MOD);
}
         )
         signExpression2 {
((MultiplyingExpression)$expression).setExpression2($signExpression2.expression);
}
     )*
     ;

signExpression1 returns [Expression expression]
    : signExpression { $expression = $signExpression.expression; }
    ;

signExpression2 returns [Expression expression]
    : signExpression { $expression = $signExpression.expression; }
    ;
======================================================

Just the grammar without the Java code

======================================================

// Author: Atul Dambalkar (atul at entrib.com)

parser grammar TParser;

options {

    // Default language but name it anyway
    language  = Java;

    // output as AST
    output = AST;

    // Use a superclass to implement all helper methods, instance variables
and overrides
    // of ANTLR default methods, such as error handling.
    //
    superClass = AbstractTParser;

    // Use the vocabulary generated by the accompanying lexer. Maven knows
how to work out the relationship
    // between the lexer and parser and will build the lexer before the
parser. It will also rebuild the
    // parser if the lexer changes.
    tokenVocab = TLexer;
}

// Some imaginary tokens for tree rewrites
tokens {
    UNARYNOT;
    UNARYMINUS;
    UNARYEXISTS;
}

// What package should the generated source exist in?
@header {
    package com.entrib.poc.antlr;
}

prog
    : expr EOF
    ;

expr
    :
    relationalExpression
    (
        (
            AND^
            |
            OR^
        )
        relationalExpression
    )*
    ;

valueList
    : LB!
        (primitiveValue)
        (COMMA primitiveValue)*
      RB!
    ;

primitiveValue
    : INT
    |
    FLOAT
    |
    TIMESTAMP
    |
    STRING
    |
    ID
    |
    TAG
    |
    LP! expr RP!
    ;

primitiveElement
     :
     primitiveValue
     |
     valueList
     ;

existsExpression
     : EXISTS TAG
     ;

relationalExpression
     : arithExpression
     (
         (
             EQ^
             |
             NEQ^
             |
             GT^
             |
             GTE^
             |
             LT^
             |
             LTE^
             |
             IN^
             |
             NIN^
             |
             CONTAINS^
             |
             STARTS^
             |
             ENDS^
             |
             ALL^
             |
             NOR^
         )
         arithExpression
     )*
     ;

arithExpression
     : multiplyingExpression
     (
         (
             PLUS^
             |
             MINUS^
         )
         multiplyingExpression
     )*
     ;

signExpression
     :
     MINUS signExpression
     | existsExpression
     | primitiveElement
     ;

multiplyingExpression
     : signExpression
     (
         (
             MULTI^
             |
             DIV^
             |
             MOD^
         )
         signExpression
     )*
     ;
============================================================

-- 
Co-founder & CTO | Entrib Technologies | www.entrib.com  | Cell: +91 94223
15436 | Office: +91 20 4129 7982 | Email: atul at entrib.com <kiran at entrib.com>


More information about the antlr-interest mailing list