[antlr-interest] conditional tree rewrite
Marco Trudel
marco at mtsystems.ch
Fri Sep 18 05:50:49 PDT 2009
Dear Jim
Jim Idle wrote:
> Marco,
>
> You can rewrite inside any alt so only rewrite as conditional if you
> take that alt.
I'm afraid I'm too big an antlr newcomer to understand what you mean.
Can you maybe give an example? Maybe directly with the rule I made to
show my problem?
> Also, avoid imaginary token bloat- you don't need the
> imaginaries in expression trees (generally) and can just use the ^
> operator.
Also here, an example would be very helpful. I don't know how else I
could do without inserting imaginary tokens (or how to insert imaginary
tokens better).
Or do you only mean in expression trees like the one I mentioned from
"ART_EXPRESSION -> ART_ASSIGNMENT_EXPRESSION -> ... -> ART_CONSTANT ->
0" for a constant? If so, than that's exactly my question: how to do the
tree rewrite to only get "ART_EXPRESSION -> ART_CONSTANT -> 0"? I don't
see a way to insert the imaginary tokens like e.g.
"ART_LOGICAL_OR_EXPRESSION" only if it's a logical or expression for
this rule:
logical_or_expression
: logical_and_expression ('||' logical_and_expression)*
;
Because it's not a logical_or_expression if the parenthesis part is not
found. So, my goal is this:
logical_or_expression
: logical_or_expression2 -> ^(ART_LOGICAL_OR_EXPRESSION
logical_or_expression2)
| logical_and_expression
;
logical_or_expression2
: logical_and_expression ('||' logical_and_expression)+
;
But, as mentioned in my original mail, this leads to too much ambiguity
to compile the grammar. So, how to do exactly that without adding
ambiguity (the "|" pipes in case I use the wrong word for it)?
> The AST is more about being unambiguous and easilly walkable
> than how it looks to you when you print the tree. You probably want to
> lose the 'literals' and make these lexer defined tokens. It will be
> easier when you build the tree walker.
I don't need a tree walker. But of course I'm interested in having a
good grammar. Can you maybe also make an example of what you mean with
"lose the 'literals' and make these lexer defined tokens"?
Thanks for your time and have a nice day
Marco
> Jim
>
> On Sep 17, 2009, at 6:32 AM, Marco Trudel <marco at mtsystems.ch> wrote:
>
>> Dear all
>>
>> I'm trying to adapt the "ANSI C grammar for ANTLR v3" so that it
>> generates an AST. So my first step was to change all rules like:
>>
>> lvalue
>> : unary_expression
>> ;
>>
>> to
>>
>> lvalue : lvalue2 -> ^(ART_LVALUE lvalue2);
>> lvalue2
>> : unary_expression
>> ;
>>
>> This already gives me a nice AST, except for expressions. For instance
>> "0" will be represented as:
>> ART_EXPRESSION
>> '-ART_ASSIGNMENT_EXPRESSION
>> '- ART_CONDITIONAL_EXPRESSION
>> '- ART_LOGICAL_AND_EXPRESSION
>> .
>> .
>> .
>> '- ART_PRIMARY_EXPRESSION
>> '- ART_CONSTANT
>> '- 0
>>
>> I assume that the grammar has been build this way for performance
>> reasons. The rule for conditional_expression for instance looks like:
>>
>> conditional_expression
>> : logical_or_expression ('?' expression ':' conditional_expression)?
>> ;
>>
>> So it's actually not a conditional_expression if the part in parentheses
>> isn't there. So my initial rewrite:
>>
>> conditional_expression : conditional_expression2 ->
>> ^(ART_CONDITIONAL_EXPRESSION conditional_expression2);
>> conditional_expression2
>> : logical_or_expression ('?' expression ':' conditional_expression)?
>> ;
>>
>> creates a "wrong" AST if it's actually not a conditional expression and
>> I'm now trying to create the tree conditionally. My first approach was
>> to change the rules this way:
>>
>> conditional_expression
>> : conditional_expression2 -> ^(ART_CONDITIONAL_EXPRESSION
>> conditional_expression2);
>> | logical_or_expression
>> ;
>> conditional_expression2
>> : logical_or_expression '?' expression ':' conditional_expression
>> ;
>>
>> But, since there are 16 rules to adapt, after about 6 I run into
>> "internal error: ...: could not even do k=1 for decision 49; reason:
>> timed out (>1000ms)". So this creates too much ambiguity and therefore I
>> went for syntactic predicates:
>>
>> conditional_expression
>> : ( logical_or_expression '?' expression ':' conditional_expression )
>> => logical_or_expression '?' expression ':'
>> conditional_expression ->
>> ^(ART_CONDITIONAL_EXPRESSION logical_or_expression '?' expression ':'
>> conditional_expression)
>> | logical_or_expression
>> ;
>>
>> But the same problem (too ambiguous to compile). So I'm really stuck now
>> since I don't know how to do this correct (meaning efficient). I could
>> imagine that this can be done with semantic predicates (I don't know
>> how) but I'm not sure about the runtime overhead when parsing a C file.
>> Or I can go over the created tree myself and clean up the expressions.
>> But the tree is already substantial bigger than it should be because of
>> the intermediate wrong "foo_expression", so I guess solving the problem
>> at the grammar level is better.
>>
>> Any opinions? Maybe I miss the completely obvious solution...
>> I think I should add that I'm a complete ANTLR beginner; so everything I
>> wrote might make no sense at all.
>>
>>
>> Thanks and have a nice day
>> Marco
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list