[antlr-interest] ANTLR vs lex/yacc

Mon Jan 17 04:35:38 PST 2005

On Sun, 9 Jan 2005 21:49:13 -0600, Mike Bresnahan <mike at fruitioninc.com> wrote:
> Speed forward to today.  I first started using ANTLR a few months ago.  I
> chose it because it generates C# and that is what I needed.  My experience
> with ANTLR has been much different than my experience with yacc/lex.  I have
> found it extremely difficult to get my head around the tool.  I don't think
> the problem is the tool per se, but rather LL(k) grammers.  I don't find
> them to be intuitive.  The grammers are harder for me to read and write.

I guess this is something to get used to. LL(k) works top down and
LALR works bottom up in matching. (Can't add anything to the excellent
post of Nigel) Some things you get for free in LALR and not in LL(k)

> One very hard concept is that I cannot use left recursion, however "middle
> recursion" and "right recursion" are just fine.  Why is left recursion
> treated differently?  It just seems asymeteric and ad-hoc to me.

It might be enlightening to just read a small antlr generated parser
to see how it matches things (or read a text book on LL(1)/LL(k)). For
instance the calc example. It is really easy to see how an antlr
parser/lexer moves through the input.

> So either I've gotten stupider over the years, my brain is more in tune with
> LALR grammers, or LL(k) grammers are simply harder to understand than LALR
> grammers.  Why is it that people like ANTLR more than yacc/lex?  Why have I
> had a different experience?

Before switching to antlr I've used lex/yacc, slade (an LL(1) parser
generator of the university of twente) and cocktail (has both LALR and
LL(1) parsers based on Modula 2 though which sucks) When I switched to
ANTLR I needed quite a lot of time to get used to some things. A big
handicap is the docs, in general everything you need is in it, but the
structure is somehow very hard to cope with. I'd read it all to start
with even if you don't get everything at start.

> Background note: I've never studied the theory of compilers nor do I have a
> degree in computer science.  However, I have been programming for 20 years
> or so; 10 professionally.

In a sense that's a good background for getting a hang of antlr. Start
reading the generated code and trace through it a few times and things
will start dawning is my guess. One of the big advantages of antlr is
the readability of it's generated code. As a result it's really easy
to debug programming errors in action code. If you debug a yacc parser
then you often think how the heck did I end up here. If you debug an
ANTLR parser you directly can see the lookahead in a rule and combine
that with the follow sets that are used to predict which rule should
be entered next. This gives a lot of intuition as to where problems
are (at least for me it does)

Some other advantages: 

- One algorithm for generated lexers/parsers/treeparsers. They're all
the same. (some small oddities left aside: for a lexer all the token
rules are or'ed into a nextToken method which will give some
unexpected ambiguities for the the newcomer, treeparsers only use a
lookahead of 1 which is also unexpected for some)
- You get much better error messages/handling for practically free
compared to lex/yacc. (although maybe lacking a bit in flexibility,
but as far as I've seen tweaks are hardly ever needed to improve error
messages/handling)
- ANTLR allows you to cheat. If you run into an ambiguity you don't
see how to fix easily you can always use a syntactic predicate to
patch things up so you can get on with it. Later when the complete
grammar is finished you can start removing them when you got the
complete picture.
- You get a lexer that can parse too.... Ok this is in a sense nasty
too. You can sometimes solve things you can't fix in the parser by
doing a little more in the lexer. For newbies it's often the problem
not doing too much in the lexer.
- Once you get the hang of things you start seeing ANTLR as a general
purpose parsing language ;) Especially when dealing with tree parsers
it's really convenient to use tree parsers inside treeparsers. It just
behaves like a function call.
- You get tokenstream filtering. You can stack parsers so one parser
becomes the lexer for the other thus filtering the tokens that come
from a lower layer to the upper layer. This allows to tackle some
really nasty parsing problems.

Currently I'd only grab lex/yacc if I needed speed. ANTLR3 will
probably fix that though. LALR is not necessary faster than LL(1) /
LL(k).

Hope this helps,

Ric