[antlr-interest] Re: High level semantic analysis

Tue May 20 11:32:15 PDT 2003

See the GCC grammar www.codetransform.com/gcc.html for some ideas which
aren't completely hooked up:

CSymbolTable.java--Parser add entries and pushes/pops scope.  It currently
only looks up entries to resolve whether something is a typedef name or not.
But the symbol table is complete and could be used to do other things.

TNode.java--has support for double linking (so a node can find its parent)
which is currently unused.  To use it you would first build the complete
tree and then call doubleLink().  TNode Also has a slot for the defNode, the
idea being to have a tree pass which annotates every use of a variable with
the definition to make it easy to find the definition without having to keep
track of the current scope.  It also has a Hashtable of attributes so you
could put anything you want in for the node.  That is how you decorate trees
in ANTLR.

Note that the above stuff is best for analysis but for transformation you
might have extra bookkeeping to do, particularly if you are deleting or
changing definitions.  

In answer to question 1) ANTLR isn't in the business of bookkeeping, it just
does lexing/parsing/tree parsing/tree building.  Everything else is up to
you.  Want more information in your Tokens or ASTs?  Just subclass and put
it there.  Need a symbol table?  Make one.  

2) AST is just an interface, right?  In java that's easy, I subclass
CommonToken.java because that has most of the functionality I need.  For C++
I'll defer to other experts such as Ric or
http://www.imada.sdu.dk/%7Emorling/.

It seems ANTLR needs more examples/articles/tutorials of proper use of
symbol tables and use/def information.  I'll put on my list to write up some
like this for the GCC grammar.

More on symbol tables http://www.bearcave.com/software/java/java_symtab.html
and articles from
http://developer.java.sun.com/developer/technicalArticles/Parser/ --
specifically
http://developer.java.sun.com/developer/technicalArticles/Parser/SeriesPt4/

Monty

-----Original Message-----
From: Tiller, Michael (M.M.) [mailto:mtiller at ford.com] 
Sent: Tuesday, May 20, 2003 5:52 AM
To: 'antlr-interest at yahoogroups.com'
Subject: RE: [antlr-interest] Re: High level semantic analysis

> From: lgcraymer [mailto:lgc at mail1.jpl.nasa.gov]
> Subject: [antlr-interest] Re: High level semantic analysis
>
> ANTLR's approach to the general problem is syntax tree
> transformation (possibly multiple passes) followed by an "output" 
> pass which might generate an internal form that might not be tree 
> structured or might produce code.  Ter is working towards an "output 
> template" approach for this final pass--details are still being 
> worked out.  I think that their "rule based semantic analysis" could 
> be effected through one or more transformation passes, one or more 
> linking passes (tree walks which build links connecting AST nodes), 
> and a resolution pass to "decorate the tree" (attribute grammar 
> terminology).

Is there any built-in support for the "linking" and "decorating" you talk
about.  Being relatively inexperienced in this aspect, I found this to be
one of the more confusing parts.  It was easy for me to see how the lexer,
the parser and even the tree walker worked (and why they existed).

ANTLR was great for getting to the point where all the information was
nicely arranged in a tree.  What I struggle with is "Now what?".  For
example, one of the first passes I could imagine doing on my AST would be
type/name lookup.  Once I've done that, I'd like to keep that information
around.  I'm assuming that is what you mean by linking.

Let me give you a quick example.  A simple Modelica package might look like
this:

package Ford
  package Types
    type Torque = Real(quantity="Torque", units="N.m");
    type Pressure = Real(quantity="Pressure", units="N/m^2");
  end Types;

  model Test1
    Ford.Types.Torque tau;
    Types.Pressure P;
  end Test1;
end Ford;

In order to elaborate the model Test1, I need to find the information
associated with the variables "tau" and "P".  Note that while the definition
of "Torque" and "Pressure" are in the same place (Ford.Types.*), I can
reference them in different ways (Ford.Types.Torque vs. Types.Torque) but
the semantics are the same in either case.

So, I have an AST node for my declaration that probably looks something like
this:

#(DECL #(TYPENAME "Ford" "Types" "Torque") #(COMPONENT "tau"))

Somewhere else in my AST I have something like:

#(PACKAGE "Ford" ... #(PACKAGE "Types" #(TYPE "Torque" ...) #(TYPE
"Pressure")))

What I'd like to be able to do is walk the tree and find every TYPENAME and
then somehow link, as you say, that #(TYPENAME ...) node back to the #(TYPE
"Torque") node.

BUT, what I don't want is this:

#(DECL #(TYPENAME "Ford" "Types" "Torque" #(TYPE "Torque" ...)) #(COMPONENT
"tau"))

right?  This would change my tree grammar.  I want links, but I don't want
ANTLR to pay attention to those links for the purposes of tree walking.

Does this sound right or is my mental model of all this wrong?

Assuming I am right (which is a big assumption), I didn't see anything in
ANTLR that supports these kinds of associations.  Am I missing something?  I
know I can define my own AST node types and I could add some data structures
to support linking but there were two reasons I didn't do this:

1) I figured if this is what I'm supposed to do then everybody would have to
do it and ANTLR would support it somehow.  The fact that I didn't see this
functionality in ANTLR led me to think perhaps I wasn't going about this the
correct way.

2) I was quite concerned about how all this linking and associating would
impact the garbage collection system for AST nodes.  To be honest, I find
the whole process of defining a new AST type pretty complicated.  There seem
to be so many different AST types and I never know when to use one or the
other (e.g when do I use my AST type and when do I use RefAST, etc).

Now I'm not really blaming ANTLR for either of these issues.  I'm just
saying that for me this was somewhat intimidating and so I was hesitant to
get all tangled up in implementing these details.

These are just some things to consider in the development of ANTLR 3 that
might make the tool easier and more approachable to people like me.

--
Mike

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/