[antlr-interest] Fwd: Pruning the Parse Tree

Mon Mar 10 22:15:31 PDT 2008

---------- Forwarded message ----------
From: Aaron Armstrong <ae.armstrong at gmail.com>
Date: Mon, Mar 10, 2008 at 10:14 PM
Subject: Re: [antlr-interest] Pruning the Parse Tree
To: Richard Clark <rdclark at gmail.com>
Cc: Guntis Ozols <guntiso at latnet.lv>, antlr-interest at antlr.org

I'm not trying to pick and choose.  In fact, I would like to preserve all
the original code elements.  Parser grammars can give you a lot of baggage
(for example, separating method declarations and formal parameters out).
Right now, I have most of the base algorithm written and I just need the
parser to properly chop up the code.  I'm not interested in putting the
algorithm (which requires passing over nodes several times) into ANTLR
actions; that would be extremely messy.

When I took my compilers class, I remembered distinct parts of the
compiler.  First the lexer, then the parser, then the AST, then the symbol
table, semantics checking, and finally output.  With our parser generator
(JCup) it seemed like creating the AST was easy: each rule was a node, and
the node's children were the lexer or parser tokens that made up the rule.
I would be very happy with output like this.  When I learned that ANTLR
outputs a flat AST by default, and that I would need to write another
grammar to produce an AST of this nature, I was not happy.  I did take some
time to write AST output for importDecl; after successfully kludging
something together for this one rule (out of around 50), I decided this
would not work.

Then I did some more reading in TDAR and read about a Parse Tree.  Thinking
this would meet my simple needs, I followed the example given on the
website.  At first I got a NullPointerException following the example, but
I've worked with it more and I no longer get that.

I have been happy with the Parse Tree output.  It's just that it gives me
these extra nodes.  ANTLRWorks can recognize these extra nodes (and colors
them differently).  If someone knows how to recognize which nodes are extra,
I would be very grateful.

I understand that features like rewrite rules and StringTemplates allow for
more expressivity.  These are interesting and powerful concepts for writing
new languages.  In fact, ANTLR could probably represent all of my work so
far. However, tools this powerful require much time to master.  I just need
ANTLR to properly break up the code and give me the AST (or Parse Tree in
this case).  I can take care of the rest.

Thank you for reading my rant.

On Mon, Mar 10, 2008 at 7:22 PM, Richard Clark <rdclark at gmail.com> wrote:

> So Aaron, what are you trying to identify in the target language?
> If you need to pick and choose, why not write a filter using ANTLR?
> Filters use lexer rules to identify and extract parts of larger files.
>
> Krugle uses ANTLR filters to extract method declarations from multiple
> languages. I've used filters to extract database table definitions from a
> giant mix of table defs and procedural code (over 10,000 lines) and then
> used the lexer definitions as the base of a SQL dialect translator.
>
> With as many struggles as you seem to be having, the right answer is
> usually to back up and look at other ways to use the tool. (Years of working
> tech support taught me this.)
>
> ...Richard
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080310/8670253a/attachment-0001.html