[antlr-interest] seeking advice for a good approach
Austin Hastings
Austin_Hastings at Yahoo.com
Sat Oct 20 17:02:15 PDT 2007
Geoff,
I think that's the purpose of the "tree grammar" stuff. In the first
(syntax) grammar, you convert a parse tree into an AST. The point of the
AST is to be free of parse-tree stuff, totally clear as to purpose, and
easy to work with.
Consider the various things you had to do to implement operator
precedence in your grammar: an expression is an and_expr, an and_expr is
one or more or_expr, an or_expr is one or more plus_expr, a plus_expr is
one or more mult_expr, etc. That goes away once you have your AST,
because you build your AST so that the tree structure encodes the order
of evaluation.
Your "semantic" parser (tree grammar) just has one rule for expr: an
expr is a primary, or an expr + an operator + an expr. An operator is
any of (+,-,/,*,%,^,=,etc.) Because the tree itself will specify order,
your semantic parse can be really simple on this score. But what are you
going to DO?
I think you'll be faced with several semantic passes, depending on your
target: a compiler will have maybe more than an interpreter. A
"compiling interpreter" might be the worst possible case. There's some
disagreement whether you should use a tree grammar, or use a
tree-walking code module (the Visitor pattern gets mentioned here, but
it's not the only way).
You specifically mentioned type checking of operands. My solution to
that is to build a map<node,info> for the expressions, and then use the
sub-expressions to determine the type of operator to be called. Because
the tree objects can serve as keys in a map, you don't have to extend
the commontree class. You can attach the data via some other mechanism,
such as your symbol table.
Please keep in mind that depending on the complexity of the language,
you're going to want both "centralized" and "decentralized" data
storage. You don't want to encode the info into every node, or every
token, but the tokens or nodes might be the right source of location
data -- an identifier 'foo' in one place isn't necessarily the same as
'foo' in another place.
Another advantage of this approach is that you get away from throwing
exceptions whenever your parser is unhappy. For error diagnosis and
recovery, I'm taking the approach that my syntactic parser should accept
as much as possible, and that I'll generate hopefully-helpful
diagnostics from later stages. I think "There is a type incompatibility
between the left- and right- sides of the '*' operator on line 262" is a
better message than "Parse failed at line 262".
=Austin
Geoff hendrey wrote:
>
> Hi,
>
> I am building an AST using the "->" functionality in the grammar file.
>
> The AST contains various types of operator nodes that operate against
> two operands (child operand nodes).
>
> For example, a logical AND operation, which has a left-hand-side
> operand and a right-hand-side operand.
>
> I want to examine each AST node, and insure that both the child nodes
> (the operands) are compatible with the operation being performed on them.
>
> Here is the approach I am going to take, but maybe some ANTLR genius
> knows a better way.
>
> So my plan is to extend CommonTree, to say TypeCheckingTree. When new
> TypeCheckingTree is constructed, I'll record the type of the operator,
> if the node is an operator. Then when addChild is called, I will throw
> an Exception if the child (operand) is incompatible. For example if
> the node is 'AND' and a child is the literal '3.14159', an Exception
> will be thrown since floating point numbers are not acceptable
> operands to a boolean 'AND' function.
>
> Right now, the only way I can think of to record the operator type, is
> to make some function like this, which would be called from the
> constructor of TypeCheckingTree. This will allow the node to know it's
> type, and to check the type of child nodes (operands) for compatibility :
>
> public String getType(Token t){
> String astNodeName = t.toString();
> if(astNodeName.equals("AND")){
> return "BOOLEAN"
> }else if(astNodeName.equals("OR")){
> return "BOOLEAN"
> }else if(astNodeName.equals("FLOAT")){
> return "NUMERIC";
> }//etc,etc.
> }
>
> The above seems really flaky. Especially since "etc, etc" is a really
> long list, and has to be maintained. It would be better if somehow I
> could assign a type (or somehow attach meta data) to an AST node,
> from inside the grammar file. Relying on String comparisons against
> the node's name is pretty weak.
>
> I'm hoping I am an ignorant newbie and someone will slap me straight!!!
>
> -g
> ------------------------------------------------------------------------
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.488 / Virus Database: 269.15.3/1081 - Release Date: 10/19/2007 5:41 PM
>
More information about the antlr-interest
mailing list