[antlr-interest] Representations of AST

Foust javafoust at gmail.com
Thu Apr 2 11:24:44 PDT 2009


Wow, Andy. So well-said. 

Both about the difficulty of parsing C/C++ and the state of tree walkers.
It's so easy to shoot yourself in the foot using Antlr. It can simplify the
initial design when you're still figuring out what you want to do, but as
the codebase grows, you can be thinking it's removing complexity when it's
really just adding to it.

For the simpler cases, Antlr's tree walker seems very nice. And there are
many smaller DSLs that can benefit from such mechanisms to help their
creator get them up and running quickly. But as you've stated time and time
again, as the complexity of the language grows or if the handling of the AST
requires a lot of extra logic, the value of using the built-in syntax
quickly diminishes.

Brent

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org 
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Andy Tripp
> Sent: 2009-04-02 11:03
> To: Alexander Brown
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Representations of AST
> 
> Alexander,
> 
> The crux of what I say here:
> http://www.jazillian.com/articles/treewalkers.html
> is that as the amount of logic needed in your treewalker 
> grows, the ANTLR treewalker doesn't really help. You start 
> off with a few simple actions triggered at various points of 
> treewalking, but then it grows into a large chunk of code 
> where it doesn't really help to have that code triggered at 
> certain points in a treewalk.
> Then you suspect things would be simpler to have your code 
> just accept an AST as an argument and do its own walking, and 
> throw out the ANTLR treewalker.
> 
> I don't have any good answers on how to "encapsulate my 
> semantic representation code" better. I've found that when my 
> AST isn't quite the shape that I want, I have lots of trouble 
> getting ANTLR to create the AST that I want. But maybe that's just me.
> 
> As for your semantic model that you produce from an AST, all 
> I can say is that I'm now trying to do simple code 
> instrumentation into C code, and I'm now on my fourth 
> redesign of my model. Just to figure out a variable's type 
> with all the typedefs, structs, arrays, pointers, etc.
> is really hard. Given a declaration "MYTYPE **v[1][2];" and a 
> reference "*(a.f().v[3] + n)", what type is the reference? I 
> could spend the rest of my life staring at C ASTs.
> 
> So I feel your pain.
> I was also shocked to find that the SQL standard was about 
> 1000 pages, and the language approaches C++ in complexity. 
> Someone needs to do for SQL (and C++) what XML did for SGML: 
> strip out the 80% that's cruft.
> 
> I know Alexandre Porcelli was also working on an SQL grammar.
> 
> Andy
> 
> 
> 
> Alexander Brown wrote:
> > Hi,
> >  
> > Perhaps this will sound like a rather stupid question, but I am 
> > wondering if there is a better way to approach the problem 
> I am trying 
> > to solve.
> >  
> > I am interested in parsing SQL.  I have developed a grammar 
> based on 
> > the (overly complex) SQL2003 specification for my corpus (something 
> > like
> > 1GB+) of SQL statements. I've also built a treewalker that 
> walks my AST.  
> >  
> > My application is currently converting my AST into a Java-based 
> > semantic object model that, for all intents and purposes, 
> reflects the 
> > structure of the AST on a 1:1 basis.  For my application, I need an 
> > object model based representation of SQL.
> >  
> > Building the object model and matching stringtemplate 
> library has been 
> > extremely time consuming- there are something like 1000 rules in the
> > SQL2003 spec and I have also built composite grammars that handle a 
> > superset of the spec such as DB specific constructs 
> (old-school Oracle
> > outer join syntax, for example) and procedural wrappers 
> like PLSQL.   My 
> > treewalker has thus become intermingled with vast amounts 
> of Java that 
> > builds my  sematic model and my Java object model has, of course, a 
> > large number of classes.  I am beginning to think that I have done 
> > this wrong.
> >  
> > After the horse has bolted, I am wondering- was there a 
> better way to 
> > approach this?  I am particularly keen to encapsulate my semantic 
> > representation code and embed little or no Java in my 
> TreeWalker (even 
> > if the 1:1 mapping remains).  I think I have missed a step 
> somewhere.
> >  
> > Thanks for your input.
> > 
> > Regards,
> >  
> > Alex
> > 



More information about the antlr-interest mailing list