[antlr-interest] seeking advice for a good approach

Austin Hastings Austin_Hastings at Yahoo.com
Sat Oct 20 17:02:15 PDT 2007


Geoff,

I think that's the purpose of the "tree grammar" stuff. In the first 
(syntax) grammar, you convert a parse tree into an AST. The point of the 
AST is to be free of parse-tree stuff, totally clear as to purpose, and 
easy to work with.

Consider the various things you had to do to implement operator 
precedence in your grammar: an expression is an and_expr, an and_expr is 
one or more or_expr, an or_expr is one or more plus_expr, a plus_expr is 
one or more mult_expr, etc. That goes away once you have your AST, 
because you build your AST so that the tree structure encodes the order 
of evaluation.

Your "semantic" parser (tree grammar) just has one rule for expr: an 
expr is a primary, or an expr + an operator + an expr. An operator is 
any of (+,-,/,*,%,^,=,etc.) Because the tree itself will specify order, 
your semantic parse can be really simple on this score. But what are you 
going to DO?

I think you'll be faced with several semantic passes, depending on your 
target: a compiler will have maybe more than an interpreter. A 
"compiling interpreter" might be the worst possible case. There's some 
disagreement whether you should use a tree grammar, or use a 
tree-walking code module (the Visitor pattern gets mentioned here, but 
it's not the only way).

You specifically mentioned type checking of operands. My solution to 
that is to build a map<node,info> for the expressions, and then use the 
sub-expressions to determine the type of operator to be called. Because 
the tree objects can serve as keys in a map, you don't have to extend 
the commontree class. You can attach the data via some other mechanism, 
such as your symbol table.

Please keep in mind that depending on the complexity of the language, 
you're going to want both "centralized" and "decentralized" data 
storage. You don't want to encode the info into every node, or every 
token, but the tokens or nodes might be the right source of location 
data -- an identifier 'foo' in one place isn't necessarily the same as 
'foo' in another place.

Another advantage of this approach is that you get away from throwing 
exceptions whenever your parser is unhappy. For error diagnosis and 
recovery, I'm taking the approach that my syntactic parser should accept 
as much as possible, and that I'll generate hopefully-helpful 
diagnostics from later stages. I think "There is a type incompatibility 
between the left- and right- sides of the '*' operator on line 262" is a 
better message than "Parse failed at line 262".

=Austin



Geoff hendrey wrote:
>
> Hi,
>
> I am building an AST using the "->" functionality in the grammar file.
>
> The AST contains various types of operator nodes that operate against 
> two operands (child operand nodes).
>
> For example, a logical AND operation, which has a left-hand-side 
> operand and a right-hand-side operand.
>
> I want to examine each AST node, and insure that both the child nodes 
> (the operands) are compatible with the operation being performed on them.
>
> Here is the approach I am going to take, but maybe some ANTLR genius 
> knows a better way.
>
> So my plan is to extend CommonTree, to say TypeCheckingTree. When  new 
> TypeCheckingTree is constructed, I'll record the type of the operator, 
> if the node is an operator. Then when addChild is called, I will throw 
> an Exception if the child (operand) is incompatible. For example if 
> the node is 'AND' and a child is the literal  '3.14159', an Exception 
> will be thrown since floating point numbers are not acceptable 
> operands to a boolean 'AND' function.
>
> Right now, the only way I can think of to record the operator type, is 
> to make some function like this, which would be called from the 
> constructor of TypeCheckingTree. This will allow the node to know it's 
> type, and to check the type of child nodes (operands) for compatibility :
>
> public String getType(Token t){
>     String astNodeName = t.toString();
>     if(astNodeName.equals("AND")){
>        return "BOOLEAN"
>     }else if(astNodeName.equals("OR")){
>        return "BOOLEAN"
>     }else if(astNodeName.equals("FLOAT")){
>        return "NUMERIC";
>     }//etc,etc.
> }
>
> The above seems really flaky. Especially since "etc, etc" is a really 
> long list, and has to be maintained. It would be better if somehow I 
> could assign a  type  (or somehow attach meta data) to an AST node, 
> from inside the grammar file. Relying on String comparisons against 
> the node's name is pretty weak.
>
> I'm hoping I am an ignorant newbie and someone will slap me straight!!!
>
> -g
> ------------------------------------------------------------------------
>
> No virus found in this incoming message.
> Checked by AVG Free Edition. 
> Version: 7.5.488 / Virus Database: 269.15.3/1081 - Release Date: 10/19/2007 5:41 PM
>   



More information about the antlr-interest mailing list