[antlr-interest] Two More Bugs in ANTLRv4

Wed Feb 29 09:00:54 PST 2012

>  It's basically saying that
> one possible form of expr is two exprs (which in turn can each be
> another two exprs).  With no other distinguishing tokens there's
> no way to exit that path.  So that way lies infinite recursion,
> madness, and/or Sparta.

Hmm, looks like a perfectly fine grammar to me.
Not one a top-down parser would like, of course,
but no problem for bottom-up algorithms. expr
can always produce 'x', so there's nothing "infinite"
other than the normal ability of a grammar to produce
an infinite number of strings (it's not non-terminating).

> Was this from a real usage attempt (if so, please explain more
> what you were trying), or were you just experimenting?

Awk introduced implicit concatenation about 35
years ago. Not sure if there are any earlier examples of
the implicit binary operator.

This is a one-off case, but a well-established one.
A parser generator that truly supported expressions
in their "natural" language should not only allow
the implicit operator, but allow the user to specify
its precedence and associativity, IMHO.

This is certainly doable bottom-up. One knows when an
expr has been seen and that the next token could start
an expr and can therefore "pretend" that the implicit
operator exists. Haven't thought about doing it top-down. I think
the harder problem is intelligent guidance so the user doesn't
make a mess. For example, it's problematic to try to add
this operator to C, and the parser generator has to know
quite a lot about expressions (apart from grammars) in order
to give an intelligent explanation of why that is (can't use
implicit binary if an outfix operator is overloaded to be a
postunary operator as well, as is the case in C with
'(X')' and 'X(X)').

How does Awk do it then, since it also uses parens
for both grouping and function call? Unlike C, Awk
doesn't allow a function "name" to be an expression.
(Awk source code at google: The One True Awk).
More complications! (Although, complications like
that particular one get unified, I think, if you use LR(0)
and then only examine inadequate states to see what
problems are left over for precedence/associativity
to attempt to resolve).

This is exactly why parser generators don't "really"
support expressions. To really go into that domain
requires embedding a lot of extra knowledge that's
pretty hard to get right, since it's not on a solid
mathematical foundation to start with. If you
don't try to jump in with both feet and handle
all the messes, then you only "sorta" support
expressions, like Yacc/Bison, and have to
(like them) leave a note mentioning that the user's
probably going to get into trouble if s/he strays outside
a few well-established usages.