[antlr-interest] Working around the LALL(k) vs. LL(k) problem ?

Wed Mar 20 15:11:33 PST 2002

On Wed, Mar 20, 2002 at 10:03:18AM -0800, mzukowski at bco.com wrote:
> > -----Original Message-----
> > From: Randall Nortman [mailto:antlr-list at wonderclown.com]
> > 
> > I'm working on my first ANTLR grammar, and I keep getting bitten by
> > the fact that ANTLR uses linear approximation of LL(k).  I'm wondering
> > if there are any general strategies for avoiding or working around
> > this problem.  Can predicates be used?
[...]
> 
> statement: (IDENTIFIER IDENTIFIER)=>variableDecl | expression;
> 

Of course!  I was trying to put the predicate on variableDecl instead
of statement.  Sorry, I should have figured that one out on my own.

> > On a related note, I'm designing both the syntax of this language and
> > the parser together, using ANTLR along the way to catch problems in my
> > syntax.  However, I feel like I spend as much time trying to figure
> > out why ANTLR doesn't like a rule as I do developing the syntax
> > itself.  I never know if an error message from ANTLR is because my
> > syntax is bad or because I'm just not representing it in a way that
> > ANTLR likes.  Might there be a better tool for developing the grammar
> > independent of the parser?  (I want to make sure I have a regular
> > grammar, which can be parsed without referencing any semantic
> > information such as symbol tables, so the tool should be able to
> > validate this.)
> 
> If you look at the generated code, it's usually pretty easy to figure out
> what antlr is trying to do.  What other problems have been on the antlr side
> instead of the bad syntax side?  I'm not familiar enough with other tools to
> recommend any alternatives. 

As another example, here's one of the things I ran into early on that
confused me.  It was a case of '(A|B)|C' being different from
'A|B|C'.  The following testcase is an extreme simplification of the
grammar I originally wrote that caused the problem:

  test: (ID | paren) | ID paren;

  paren: "(" test ")";

As such, there is a nondeterminism on 'test'.  Remove the grouping
parenthesis, and it works just fine.  This was very non-intuitive to
me, and took many hours to track down.  (The situation was
considerably more complex in real life, but this is what I eventually
narrowed it down to.)

I'm not knocking ANTLR at all; I think it's a wonderful system.  As I
get used to the way it works, I'm having fewer and fewer problems.
But what I ideally want in this stage of language design is to simply
write straight EBNF (no predicates), and have something to analyze it
and tell me if the grammar is ambiguous or not.  Once I'm sure I've
got a good grammar, then I can worry about how to parse it.  Doing
both at once is a little inconvenient.  (Not to mention that having
redundant information, like the predicate above that solves my
problem, clutters up the grammar, making it harder to read.)

Thanks for the help.

Randall Nortman

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/