[antlr-interest] conditional tree rewrite

Marco Trudel marco at mtsystems.ch
Thu Sep 17 06:32:42 PDT 2009


Dear all

I'm trying to adapt the "ANSI C grammar for ANTLR v3" so that it 
generates an AST. So my first step was to change all rules like:

lvalue
	:	unary_expression
	;

to

lvalue : lvalue2 -> ^(ART_LVALUE lvalue2);
lvalue2
	:	unary_expression
	;

This already gives me a nice AST, except for expressions. For instance 
"0" will be represented as:
ART_EXPRESSION
'-ART_ASSIGNMENT_EXPRESSION
   '- ART_CONDITIONAL_EXPRESSION
      '- ART_LOGICAL_AND_EXPRESSION
             .
             .
             .
                '- ART_PRIMARY_EXPRESSION
                   '- ART_CONSTANT
                      '- 0

I assume that the grammar has been build this way for performance 
reasons. The rule for conditional_expression for instance looks like:

conditional_expression
	: logical_or_expression ('?' expression ':' conditional_expression)?
	;

So it's actually not a conditional_expression if the part in parentheses 
isn't there. So my initial rewrite:

conditional_expression : conditional_expression2 -> 
^(ART_CONDITIONAL_EXPRESSION conditional_expression2);
conditional_expression2
	: logical_or_expression ('?' expression ':' conditional_expression)?
	;

creates a "wrong" AST if it's actually not a conditional expression and 
I'm now trying to create the tree conditionally. My first approach was 
to change the rules this way:

conditional_expression
	: conditional_expression2 -> ^(ART_CONDITIONAL_EXPRESSION 
conditional_expression2);
	| logical_or_expression
	;
conditional_expression2
	: logical_or_expression '?' expression ':' conditional_expression
	;

But, since there are 16 rules to adapt, after about 6 I run into 
"internal error: ...: could not even do k=1 for decision 49; reason: 
timed out (>1000ms)". So this creates too much ambiguity and therefore I 
went for syntactic predicates:

conditional_expression
	: ( logical_or_expression '?' expression ':' conditional_expression )
		=> logical_or_expression '?' expression ':' conditional_expression -> 
^(ART_CONDITIONAL_EXPRESSION logical_or_expression '?' expression ':' 
conditional_expression)
		|  logical_or_expression
	;

But the same problem (too ambiguous to compile). So I'm really stuck now 
since I don't know how to do this correct (meaning efficient). I could 
imagine that this can be done with semantic predicates (I don't know 
how) but I'm not sure about the runtime overhead when parsing a C file. 
Or I can go over the created tree myself and clean up the expressions. 
But the tree is already substantial bigger than it should be because of 
the intermediate wrong "foo_expression", so I guess solving the problem 
at the grammar level is better.

Any opinions? Maybe I miss the completely obvious solution...
I think I should add that I'm a complete ANTLR beginner; so everything I 
wrote might make no sense at all.


Thanks and have a nice day
Marco


More information about the antlr-interest mailing list