[antlr-interest] Representations of AST

Sam Barnett-Cormack s.barnett-cormack at lancaster.ac.uk
Thu Apr 2 11:31:13 PDT 2009


Foust wrote:
> Wow, Andy. So well-said. 
> 
> Both about the difficulty of parsing C/C++ and the state of tree walkers.
> It's so easy to shoot yourself in the foot using Antlr. It can simplify the
> initial design when you're still figuring out what you want to do, but as
> the codebase grows, you can be thinking it's removing complexity when it's
> really just adding to it.
> 
> For the simpler cases, Antlr's tree walker seems very nice. And there are
> many smaller DSLs that can benefit from such mechanisms to help their
> creator get them up and running quickly. But as you've stated time and time
> again, as the complexity of the language grows or if the handling of the AST
> requires a lot of extra logic, the value of using the built-in syntax
> quickly diminishes.

In my implementation of ASN.1 (a language which is handily declarative), 
I'm going for a half-and-half approach - using a Tree Grammar to put 
together an not-entirely-verified Abstract Syntax Model, using Java 
classes, and then doing further processing using that (to verify 
references, type checking, and so on), iteratively, eventually leading 
to code being emitted using StringTemplate.

Just in case that idea is of interest to anyone. For this application it 
just seemed obvious to me - trying to use an ANTLR treewalker, or any 
treewalker, really, to check all of that would require a load of extra 
classes and objects to keep track of things, and several passes using 
different tree grammars. Euch.

Sam

>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org 
>> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Andy Tripp
>> Sent: 2009-04-02 11:03
>> To: Alexander Brown
>> Cc: antlr-interest at antlr.org
>> Subject: Re: [antlr-interest] Representations of AST
>>
>> Alexander,
>>
>> The crux of what I say here:
>> http://www.jazillian.com/articles/treewalkers.html
>> is that as the amount of logic needed in your treewalker 
>> grows, the ANTLR treewalker doesn't really help. You start 
>> off with a few simple actions triggered at various points of 
>> treewalking, but then it grows into a large chunk of code 
>> where it doesn't really help to have that code triggered at 
>> certain points in a treewalk.
>> Then you suspect things would be simpler to have your code 
>> just accept an AST as an argument and do its own walking, and 
>> throw out the ANTLR treewalker.
>>
>> I don't have any good answers on how to "encapsulate my 
>> semantic representation code" better. I've found that when my 
>> AST isn't quite the shape that I want, I have lots of trouble 
>> getting ANTLR to create the AST that I want. But maybe that's just me.
>>
>> As for your semantic model that you produce from an AST, all 
>> I can say is that I'm now trying to do simple code 
>> instrumentation into C code, and I'm now on my fourth 
>> redesign of my model. Just to figure out a variable's type 
>> with all the typedefs, structs, arrays, pointers, etc.
>> is really hard. Given a declaration "MYTYPE **v[1][2];" and a 
>> reference "*(a.f().v[3] + n)", what type is the reference? I 
>> could spend the rest of my life staring at C ASTs.
>>
>> So I feel your pain.
>> I was also shocked to find that the SQL standard was about 
>> 1000 pages, and the language approaches C++ in complexity. 
>> Someone needs to do for SQL (and C++) what XML did for SGML: 
>> strip out the 80% that's cruft.
>>
>> I know Alexandre Porcelli was also working on an SQL grammar.
>>
>> Andy
>>
>>
>>
>> Alexander Brown wrote:
>>> Hi,
>>>  
>>> Perhaps this will sound like a rather stupid question, but I am 
>>> wondering if there is a better way to approach the problem 
>> I am trying 
>>> to solve.
>>>  
>>> I am interested in parsing SQL.  I have developed a grammar 
>> based on 
>>> the (overly complex) SQL2003 specification for my corpus (something 
>>> like
>>> 1GB+) of SQL statements. I've also built a treewalker that 
>> walks my AST.  
>>>  
>>> My application is currently converting my AST into a Java-based 
>>> semantic object model that, for all intents and purposes, 
>> reflects the 
>>> structure of the AST on a 1:1 basis.  For my application, I need an 
>>> object model based representation of SQL.
>>>  
>>> Building the object model and matching stringtemplate 
>> library has been 
>>> extremely time consuming- there are something like 1000 rules in the
>>> SQL2003 spec and I have also built composite grammars that handle a 
>>> superset of the spec such as DB specific constructs 
>> (old-school Oracle
>>> outer join syntax, for example) and procedural wrappers 
>> like PLSQL.   My 
>>> treewalker has thus become intermingled with vast amounts 
>> of Java that 
>>> builds my  sematic model and my Java object model has, of course, a 
>>> large number of classes.  I am beginning to think that I have done 
>>> this wrong.
>>>  
>>> After the horse has bolted, I am wondering- was there a 
>> better way to 
>>> approach this?  I am particularly keen to encapsulate my semantic 
>>> representation code and embed little or no Java in my 
>> TreeWalker (even 
>>> if the 1:1 mapping remains).  I think I have missed a step 
>> somewhere.
>>>  
>>> Thanks for your input.
>>>
>>> Regards,
>>>  
>>> Alex
>>>
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list