[antlr-interest] De-emphasizing tree grammars?

Jason Osgood jason at jasonosgood.com
Thu Dec 22 12:11:24 PST 2011


Terence wrote:

> Recently I had the chance to reevaluate and try out a simple new mechanism based
> upon parse trees, which can be automatically generated. Now, I'm doing both a DOM
> and a SAX event type model, which is suitable for lots of different applications. For
> example, if you want to collect information about all Java declarations, you can
> simply pass in a listener to the Java parser.

That would explain why I didn't get an answer.

fado's & DebugEventListener
http://www.antlr.org/pipermail/antlr-interest/2011-November/043436.html

"Wait for ANTLR v4" would have sufficed.

---

I fully endorse the use of parse trees over ASTs. At least for my code
generation work.

Prior to LL(*), ASTs were kind of necessary, but difficult, because
the parse tree varied so much from the desired AST. Rather than
wrestle with tree construction, I settled on backing my grammars with
a Builder and inlining Java code. Not much better really.

But now with LL(*), parse trees are awesome. Flexible, expressive,
concise. I love working this way. I don't know enough about grammars
and parsing to know if LL(*) parse trees can feasibly replace ASTs,
but I wouldn't be surprised.

---

The reason LL(*)'s parse trees are awesome is because of the clear
boundary/interface between grammar and post-processing.

With both AST tree construction and my inlined Builder kludge, stages
of the processing pipeline bleed into each other, making the
end-to-end brittle and changes very expensive.

---

As I mentioned in my initial question, I created an XPath-lite API.
While I love XPath, I hate all the XML and XPath libraries with the
burning passion of a billion suns. Wrappers, handlers, contexts, ad
nauseum. (That hatred extends to all the mutant DOM implementations,
though less so against XOM. Which is why I created LOX, lightweight
objects for XML.)

Most of what you'll do is drill down a parse tree and iterate over the
results. So my XPath-lite looks kinda like these (sorry if formatting
gets lost):

  for( ParseNode child : parent.find( "/rule1/rule2" ))
  {
      System.out.println( child.getText() );
  }

  for( String literal : parent.findString( "/ruleA/ruleB/Literal" ))
  {
      System.out.println( literal );
  }

My XPath-lite "syntax" is globbing, plus recognition for literals,
minus all the XPath overkill for relative, nth offsets, functions
(it's better to do all that in Java). (Globbing, swiped from shell
script path handling, is stuff like "*/ruleName/*" and
"ruleName/**/ruleName".)

I'm told my path strategy is similar to Groovy's gpath stuff. (Which
would probably be great, were it not for the unfortunate association
with the Groovy language.)

I don't know if the "process/inspect parse tree with xpath-like
grammar" strategy has legs. I've done LOTS of ETL work, so it feels
very natural to me now. And it works really well for extracting the
interesting bits out of source code (e.g. SQL, Java, HTML). I also use
this strategy to replace the interesting bits.

But this strategy probably only works when you want to partially
process a parse tree. It probably doesn't work well (ie complexity
explosion) for 1:1 translation or compilers.


Cheers, Jason


More information about the antlr-interest mailing list