[antlr-interest] conditional tree rewrite

Jim Idle jimi at temporal-wave.com
Thu Sep 17 07:57:35 PDT 2009


Marco,

You can rewrite inside any alt so only rewrite as conditional if you  
take that alt. Also, avoid imaginary token bloat- you don't need the  
imaginaries in expression trees (generally) and can just use the ^  
operator. The AST is more about being unambiguous and easilly walkable  
than how it looks to you when you print the tree. You probably want to  
lose the 'literals' and make these lexer defined tokens. It will be  
easier when you build the tree walker.

Jim

On Sep 17, 2009, at 6:32 AM, Marco Trudel <marco at mtsystems.ch> wrote:

> Dear all
>
> I'm trying to adapt the "ANSI C grammar for ANTLR v3" so that it
> generates an AST. So my first step was to change all rules like:
>
> lvalue
>    :    unary_expression
>    ;
>
> to
>
> lvalue : lvalue2 -> ^(ART_LVALUE lvalue2);
> lvalue2
>    :    unary_expression
>    ;
>
> This already gives me a nice AST, except for expressions. For instance
> "0" will be represented as:
> ART_EXPRESSION
> '-ART_ASSIGNMENT_EXPRESSION
>   '- ART_CONDITIONAL_EXPRESSION
>      '- ART_LOGICAL_AND_EXPRESSION
>             .
>             .
>             .
>                '- ART_PRIMARY_EXPRESSION
>                   '- ART_CONSTANT
>                      '- 0
>
> I assume that the grammar has been build this way for performance
> reasons. The rule for conditional_expression for instance looks like:
>
> conditional_expression
>    : logical_or_expression ('?' expression ':'  
> conditional_expression)?
>    ;
>
> So it's actually not a conditional_expression if the part in  
> parentheses
> isn't there. So my initial rewrite:
>
> conditional_expression : conditional_expression2 ->
> ^(ART_CONDITIONAL_EXPRESSION conditional_expression2);
> conditional_expression2
>    : logical_or_expression ('?' expression ':'  
> conditional_expression)?
>    ;
>
> creates a "wrong" AST if it's actually not a conditional expression  
> and
> I'm now trying to create the tree conditionally. My first approach was
> to change the rules this way:
>
> conditional_expression
>    : conditional_expression2 -> ^(ART_CONDITIONAL_EXPRESSION
> conditional_expression2);
>    | logical_or_expression
>    ;
> conditional_expression2
>    : logical_or_expression '?' expression ':' conditional_expression
>    ;
>
> But, since there are 16 rules to adapt, after about 6 I run into
> "internal error: ...: could not even do k=1 for decision 49; reason:
> timed out (>1000ms)". So this creates too much ambiguity and  
> therefore I
> went for syntactic predicates:
>
> conditional_expression
>    : ( logical_or_expression '?' expression ':'  
> conditional_expression )
>        => logical_or_expression '?' expression ':'  
> conditional_expression ->
> ^(ART_CONDITIONAL_EXPRESSION logical_or_expression '?' expression ':'
> conditional_expression)
>        |  logical_or_expression
>    ;
>
> But the same problem (too ambiguous to compile). So I'm really stuck  
> now
> since I don't know how to do this correct (meaning efficient). I could
> imagine that this can be done with semantic predicates (I don't know
> how) but I'm not sure about the runtime overhead when parsing a C  
> file.
> Or I can go over the created tree myself and clean up the expressions.
> But the tree is already substantial bigger than it should be because  
> of
> the intermediate wrong "foo_expression", so I guess solving the  
> problem
> at the grammar level is better.
>
> Any opinions? Maybe I miss the completely obvious solution...
> I think I should add that I'm a complete ANTLR beginner; so  
> everything I
> wrote might make no sense at all.
>
>
> Thanks and have a nice day
> Marco
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list