[antlr-interest] Re: ANTLR vs lex/yacc, debugging and ANTLR 3
Ian Kaplan
iank at bearcave.com
Sun Jan 16 15:50:17 PST 2005
> I have seen many people comment on how they dislike yacc/lex vs
> ANTLR.
I can't speak for the multitudes of ANTLR users in their preference
for ANTLR over YACC, but I did write a web page "Why Use ANTLR"
which can be found here:
http://www.bearcave.com/software/antlr/antlr_expr.html
I find ANTLR more sophisticated in the way it handles synthesized
results (results of grammar productions) and arguments passed down
the grammar hierarchy. In a similar vein, I like the way local
initialization code blocks can be defined. ANTLR also generates
readable code code, in contrast to YACC's tables. So you can see
that it is generating what you intended.
The issue of where the recursive is (left or right) does not have
any traction for me. I find it easy enough to write the grammar
either way. And I'm not a grammar theorist either.
Debugging both YACC and ANTLR can be difficult. I was reminded of
this recently while working on an ANTLR grammar. Because errors are
reported as a result of how the grammar logically expands, with both
YACC and ANTLR you get an error reported for a grammar location that
may be far away from the place in the grammar that caused the
problem. This makes grammars painfully difficult to debug. Here I
don't find ANTLR much of an improvement over YACC.
While on this topic: I've been meaning to write about the issue of
debugging grammars. So I guess that I'll use this as an
opportunity.
I've just finished a grammar for a query language. It has
expressions at a number of different levels and these expressions
are recursive. The grammar has been painful to debug. The only way
I could debug it was to start with the core expression and keep
adding the productions above it until I found the production that
broke the grammar. This is very time consuming and unpleasant.
It is very, very difficult to understand the cause of a problem in a
complex grammar. This understanding is equivalent to expanding out
the grammar trees until you understand where the problem lies. In
practice this is very difficult to do.
There has been some discussion here about ANTLR 3. In considering
the features for ANTLR 3, I would concentrate on the core of what I
believe is the advantage that ANTLR delivers: generation of parsers.
I wrote my own scanner for this query language. It really was not a
big deal. Scanners are pretty easy to write. ANTLR makes is easy
to integrate a scanner with the parser. I would not complain if the
scanner generation capability disappeared from ANTLR. The
concentration on scanner performance in ANTLR 3 is, I think,
misplaced. There is only so much time and I think it is a good idea
to concentrate on the core advantages of ANTLR: parser generation.
I will not bore y'all again with my discussion on tree generation,
except to state that this is not a feature I use either and would
not be sad to see less time spent on it.
What would have saved me several days of work is better features for
debugging the grammar. Perhaps something that would help expand out
the grammar so I can see where the problem lies.
Except for the difficulty of debugging ANTLR grammars, I'm pretty
happy with ANTLR. There is not a lot that I can think of that I'd
like to change.
What would help people in general, Terence, is if you finished your
book. One of my colleagues, who is an Oracle specialist who worked
with me on the back end of the query language tried made a start at
the query language parser. He pretty much failed. He had no idea
of how to proceed. Most people who have not written parsers before
or used parser generators are going to have a pretty difficult time
with ANTLR. The existing documentation does not provide much to go
on and web pages like mine are of limited help as well (my web pages
concentrate on a C++ parser and our query language parser is
targeted at Java).
Ian Kaplan
More information about the antlr-interest
mailing list