[antlr-interest] Very general question, might require a book as an answer

Sun Jan 21 14:27:50 PST 2007

Dear list,

I am involved in the Quantum DB eclipse plugin  for quite some time now. 
This is a plugin for uniform access to databases that have a JDBC driver 
(http://sourceforge.net/projects/quantum). I joined the development team 
for the frustration I kept getting with queries that contained errors 
that could have been solved at 'compile' time rather than at 'runtime'. 
This, imho, means that the syntax of the statement(s) need to be 
checked, which means a parser.

As you may be aware SQL comes in many dialects and our plugin aims to 
offer support for all of them.

So, I looked around and found antlr. It seemed very fit for the job as 
it allows inheritance. As plugin developers we deliver a base and allow 
others to enhance/specialize that for their specific database.

I got started, looked at some .g files, borrowed some constructs and got 
stuck on a very basic point: all the tables, columns, aliases and so on, 
were recognized as identifiers only. Once the statement was parsed 
'successfully', it was not possible to reason about it as the parser did 
not deliver tables involved, columns belonging to those tables and so 
on. (I think you would call this an interpreter...)

I posted to this list about 'promoting' certain tokens, and of course 
and thank you all, received an answer: promote them by adding an action:

<pre>
column_alias
    :
    i:id {#i.setType(COLUMN_IDENTIFIER_ALIAS);}
    ;
</pre>

So that my id gets promoted to a column alias whenever this rule executes.

I got this to work, finding out errors even when the syntax worked out 
ok (misspelled table and column names, wrong relationships), but it 
seemed to me that I had not unleashed the complete power of antlr.

So, now, finally, I come to the questions that may require a book (or 
2)  to answer:

1) Should I be using antlr, or should I stick with the stuff eclipse offers?
2) Is antlr the right tool of choice for a project in which each 
database vendor speaks its own dialect? In other words will the 
inheritance feature deliver the promise of less coding? And of course, 
what should be the base grammar of this all? And could we have one lexer?
3) We would like to support scripts, some dialects have statement 
separators, others do not. Does this mean I need to write something that 
separates the statements first, or is there a smarter way?
4) I never used the tree walker classes in antlr. I must admit, I do not 
understand the value of it yet. However I think they are what I need 
because after lexing and parsing I want to interpret the results. So far 
my AST is not a tree but a list. What are the benefits of using tree 
walkers?
5) Why should I upgrade to v3?

If anyone could find the time to answer even one of my questions, that 
would be greatly appreciated.

Kind regards,

Jan