[antlr-interest] conditional tree rewrite

Marco Trudel marco at mtsystems.ch
Fri Sep 18 05:50:49 PDT 2009


Dear Jim

Jim Idle wrote:
> Marco,
> 
> You can rewrite inside any alt so only rewrite as conditional if you 
> take that alt.

I'm afraid I'm too big an antlr newcomer to understand what you mean. 
Can you maybe give an example? Maybe directly with the rule I made to 
show my problem?

> Also, avoid imaginary token bloat- you don't need the 
> imaginaries in expression trees (generally) and can just use the ^ 
> operator.

Also here, an example would be very helpful. I don't know how else I 
could do without inserting imaginary tokens (or how to insert imaginary 
tokens better).
Or do you only mean in expression trees like the one I mentioned from 
"ART_EXPRESSION -> ART_ASSIGNMENT_EXPRESSION -> ... -> ART_CONSTANT -> 
0" for a constant? If so, than that's exactly my question: how to do the 
tree rewrite to only get "ART_EXPRESSION -> ART_CONSTANT -> 0"? I don't 
see a way to insert the imaginary tokens like e.g. 
"ART_LOGICAL_OR_EXPRESSION" only if it's a logical or expression for 
this rule:

logical_or_expression
	: logical_and_expression ('||' logical_and_expression)*
	;

Because it's not a logical_or_expression if the parenthesis part is not 
found. So, my goal is this:

logical_or_expression
	: logical_or_expression2 -> ^(ART_LOGICAL_OR_EXPRESSION 
logical_or_expression2)
	| logical_and_expression
	;
logical_or_expression2
	: logical_and_expression ('||' logical_and_expression)+
	;

But, as mentioned in my original mail, this leads to too much ambiguity 
to compile the grammar. So, how to do exactly that without adding 
ambiguity (the "|" pipes in case I use the wrong word for it)?

> The AST is more about being unambiguous and easilly walkable 
> than how it looks to you when you print the tree. You probably want to 
> lose the 'literals' and make these lexer defined tokens. It will be 
> easier when you build the tree walker.

I don't need a tree walker. But of course I'm interested in having a 
good grammar. Can you maybe also make an example of what you mean with 
"lose the 'literals' and make these lexer defined tokens"?


Thanks for your time and have a nice day
Marco


> Jim
> 
> On Sep 17, 2009, at 6:32 AM, Marco Trudel <marco at mtsystems.ch> wrote:
> 
>> Dear all
>>
>> I'm trying to adapt the "ANSI C grammar for ANTLR v3" so that it
>> generates an AST. So my first step was to change all rules like:
>>
>> lvalue
>>    :    unary_expression
>>    ;
>>
>> to
>>
>> lvalue : lvalue2 -> ^(ART_LVALUE lvalue2);
>> lvalue2
>>    :    unary_expression
>>    ;
>>
>> This already gives me a nice AST, except for expressions. For instance
>> "0" will be represented as:
>> ART_EXPRESSION
>> '-ART_ASSIGNMENT_EXPRESSION
>>   '- ART_CONDITIONAL_EXPRESSION
>>      '- ART_LOGICAL_AND_EXPRESSION
>>             .
>>             .
>>             .
>>                '- ART_PRIMARY_EXPRESSION
>>                   '- ART_CONSTANT
>>                      '- 0
>>
>> I assume that the grammar has been build this way for performance
>> reasons. The rule for conditional_expression for instance looks like:
>>
>> conditional_expression
>>    : logical_or_expression ('?' expression ':' conditional_expression)?
>>    ;
>>
>> So it's actually not a conditional_expression if the part in parentheses
>> isn't there. So my initial rewrite:
>>
>> conditional_expression : conditional_expression2 ->
>> ^(ART_CONDITIONAL_EXPRESSION conditional_expression2);
>> conditional_expression2
>>    : logical_or_expression ('?' expression ':' conditional_expression)?
>>    ;
>>
>> creates a "wrong" AST if it's actually not a conditional expression and
>> I'm now trying to create the tree conditionally. My first approach was
>> to change the rules this way:
>>
>> conditional_expression
>>    : conditional_expression2 -> ^(ART_CONDITIONAL_EXPRESSION
>> conditional_expression2);
>>    | logical_or_expression
>>    ;
>> conditional_expression2
>>    : logical_or_expression '?' expression ':' conditional_expression
>>    ;
>>
>> But, since there are 16 rules to adapt, after about 6 I run into
>> "internal error: ...: could not even do k=1 for decision 49; reason:
>> timed out (>1000ms)". So this creates too much ambiguity and therefore I
>> went for syntactic predicates:
>>
>> conditional_expression
>>    : ( logical_or_expression '?' expression ':' conditional_expression )
>>        => logical_or_expression '?' expression ':' 
>> conditional_expression ->
>> ^(ART_CONDITIONAL_EXPRESSION logical_or_expression '?' expression ':'
>> conditional_expression)
>>        |  logical_or_expression
>>    ;
>>
>> But the same problem (too ambiguous to compile). So I'm really stuck now
>> since I don't know how to do this correct (meaning efficient). I could
>> imagine that this can be done with semantic predicates (I don't know
>> how) but I'm not sure about the runtime overhead when parsing a C file.
>> Or I can go over the created tree myself and clean up the expressions.
>> But the tree is already substantial bigger than it should be because of
>> the intermediate wrong "foo_expression", so I guess solving the problem
>> at the grammar level is better.
>>
>> Any opinions? Maybe I miss the completely obvious solution...
>> I think I should add that I'm a complete ANTLR beginner; so everything I
>> wrote might make no sense at all.
>>
>>
>> Thanks and have a nice day
>> Marco
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: 
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list