[antlr-interest] ANTLR vs lex/yacc

Sun Jan 16 15:07:52 PST 2005

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Mike Bresnahan
> Sent: Monday, 10 January 2005 2:49 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] ANTLR vs lex/yacc
> 
> I have seen many people comment on how they dislike yacc/lex vs ANTLR.
> I'm
> curious why people think this.  My limited experience hasn't been so black
> and white.  I first learned yacc/lex about 9 years ago and I haven't used
> them since.  At the time I found them wonderful tools.  I bought the
> O'Reilly nutshell book and within a very short period of time I could whip
> out a simple parser in one day.  I found the tools very intiutive.
> 
> Speed forward to today.  I first started using ANTLR a few months ago.  I
> chose it because it generates C# and that is what I needed.  My experience
> with ANTLR has been much different than my experience with yacc/lex.  I
> have
> found it extremely difficult to get my head around the tool.  I don't
> think
> the problem is the tool per se, but rather LL(k) grammers.  I don't find
> them to be intuitive.  The grammers are harder for me to read and write.
> 
> One very hard concept is that I cannot use left recursion, however "middle
> recursion" and "right recursion" are just fine.  Why is left recursion
> treated differently?  It just seems asymeteric and ad-hoc to me.
> 
> So either I've gotten stupider over the years, my brain is more in tune
> with
> LALR grammers, or LL(k) grammers are simply harder to understand than LALR
> grammers.  Why is it that people like ANTLR more than yacc/lex?  Why have
> I
> had a different experience?
> 
> Background note: I've never studied the theory of compilers nor do I have
> a
> degree in computer science.  However, I have been programming for 20 years
> or so; 10 professionally.
> 

Well I have been using ANTLR for around 6 months now, although mostly in the
last month has it become essential to the things I am doing. I have been a
(casual, non-commercial) programmer for around 15 years in various
languages.

I learnt LL(k) first because of ANTLR, and although ANTLR can take some time
to learn, I think that anyone with some Compiler Toolkit experience will
quickly be able to understand the concepts. I think maybe clearer
documentation and HOWTO guides might be the way to go since I found the
reference guide too heavy-handed and the tutorials are only sufficient to
get someone started on simple problems. For example, once I got to Abstract
Syntax Trees it was difficult to understand how then to use them to achieve
different results, particularly when it comes to building a complete
compiler for a programming language. However, ASTs are a common approach
used in probably just about every compiler. 

Anyway, in the last two weeks, I tried learning LR and its variants, like
LALR, because of the LL(k) left-grammar problem. However, I found *LR* to be
quite non-intuitive, because they are bottom-up, rather than top-down, and
my ANTLR-centric way of thinking made it hard to visualise what was going
on. However, once I sat down with a decent book or two from the library, it
became very simple to see how both types of parsers work.

The key thing to remember with both LL(k) and LR is that they have to match
terminals - the non-terminals are simply there to group
terminals/non-terminals into patterns (with/without recursion). With LL(k)
you have to look at every possible alternative diving down into the rules -
if I see TOKEN1 then I will match one path but if I see TOKEN2 then I will
match a different path. Non-determinisms result from not being able to
predict a path with the current amount of look-ahead - this occurs because
of explicit choices in a rule, or because of a ()?, ()* or ()+ component
where you might match the alternative or "exit". 

Similarly, with LR you have to match one or more terminals before a "reduce"
can occur. You have to use the top-down approach first to "predict" what
alternatives might follow, especially in LR(1) and LALR(1) parsers.
Terminals at the front of the stream that cannot be matched are pushed down
into the stack temporarily until a suitable non-terminal can be matched
later on. This type of approach favours left-recursion, because you want to
match as much on the left first so that you can then match more complicated
non-terminals over time.

So its unfortunate that most grammars are LR-oriented, because
left-recursion is a pain to eliminate in LL(k) parsers. However, I think the
results are going to be fairly equivalent whether you choose LL(k) or LR.
Personally, I find that ANTLR is a fantastic tool, and can deal with all
sorts of conflicts and ambiguities using the syntactic predicates and
semantic predicates, and my (limited) experience with other tools is that
they cannot match ANTLR in terms of flexibility and power. It's really just
a matter of diving in and learning how best to use the tools! Whenever I
create a grammar of my own, I think I would prefer it to be LL(k) because it
makes it many times easier to see where the grammar is ambiguous.

Regards 

Nigel

--
Nigel Sheridan-Smith
PhD research student

Faculty of Engineering
University of Technology, Sydney
Phone: 02 9514 7946
Fax: 02 9514 2435