[antlr-interest] OO design question

Thu Jun 13 14:09:29 PDT 2002

Folks,

Let's imagine an optimal world where I'm contemplating building a 
simpler internal structure for ANTLR.  Please suspend disbelief for the 
duration of this email. :) ;) :D

Ok, so ANTLR works like this:

1. preprocess the grammar, expanding any inheritance relationships.
2. parse the grammar and build an internal graph/network representation; 
the graph looks like a syntax diagram you would build to describe a 
grammar.  A token manager tracks token type definitions and is 
associated with the graph.
3. try to generate code, which drives the grammar analysis engine.  In 
other words, when I try to generate a decision in the output parser, I 
on-demand ask the analyzer to compute the lookahead sets and then use 
that to generate decision code.  This is the same principle for html 
output, text output, java, C++, whatever.

In the current implementation this stuff is so hopelessly interwined 
that it's pretty hard to separate what is going on in your head.  
Changes in one area may have broad effects on others.

Optimally, ANTLR would be re-engineered to separate the various phases 
as much as possible for ease of specification, ease of debugging, and 
ease picking and choosing what phases you wanted to make a cool tool.  
For example, you might want to generate a pretty printer.  You could 
take the front-end that generated the graph and hook a tool on the back 
that just dumps the grammar, ignoring code gen and analysis etc...  You 
could even build an interpreted version of antlr from just the front end 
and the analysis phase, ignoring the code gen.

I imagine that you'd have

1. front end generates grammar graph and vocabulary manager; might even 
have a persistence text format so you don't even need to know antlr's 
internal data structure format--you could use your own or even use the 
text format to connect it to a tool written in another language.

2. analyzer, computes approximate pred-LL(k) lookahead information at 
every decision point in the grammar.  Again, you could have a text 
output format for debugging purposes or to use as an interface to 
completely separate tool.  This would no longer be driven by the code 
generator, though the code generator could ask for the information 
naturally.

3. various back ends such as Java, C++, html, etc...  I have lots more 
to say about this, but I need to firm up my "output grammar" ideas more 
first.

My specific question at the moment relates to what you might call 
aspect-oriented programming vs oo programming.  Consider the data 
structure representing your grammar (i.e., the graph).  It's very neatly 
organized as a set of GrammarElement subclasses that include subrules 
and atoms like token-refs.  Now, how do you perform analysis on this 
graph?  Conceptually you would like to keep the analyzer separate (so 
somebody could build a different one and for good programming practice 
reasons).  I would call this an aspect not an object encapsulation 
idea.  Anyway, currently I have added a method to each GrammarElement 
called lookahead() or some such that is supposed to compute lookahead 
for that type of element.  Works great except for the fact that now my 
analyzer is not a big chunk (an aspect of ANTLR) it is completely 
interwined with the graph data structure.  The reason of course is that 
it is really nice to ask a node for it's lookahead.  In order to 
separate out the analyzer, you'd have ask the analyzer to look at a 
node: Analyzer.lookahead(GrammarElement).  But, you'd lose polymorphism 
and have to have a switch in lookahead that asked what kind of node it 
was.  Ick.

So!  How do you get a good separable chunk called an Analyzer without 
making the implementation fully of crappy switch-statements?  I.e., how 
do I use aspect programming w/o losing the polymorphism convenience?!  
Anybody wanna lend me some smarts?

Thanks,
Ter
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/