[antlr-interest] Repost: ANTLRworks: Why do these rules behave differently in the embedded interpreter?

Fri Jan 1 11:10:54 PST 2010

Hi,

I originally posted the question below on 13 December.  I'm guessing I
didn't get any replies because it rolled off the end of everyone's
inbox during the holiday seasons.  So please excuse the repost; I'd be
grateful if someone could tell me whether I'm on the right track.
Since posting this question, I have observed similar (not identical)
behavior in the ANTLR IDE for Eclipse.  My guess (please confirm or
debunk) is that the built-in interpreters build the concrete syntax
tree by (correctly) pursuing the first viable alternative at each
decision point but (unfortunately) not necessarily rewinding the input
stream upon encountering an exception.  Since posting this question,
at least one other person has independently encountered the same
problem, in connection with Scott Stanchfield's excellent ANTLR 3
video tutorials [ http://javadude.com/articles/antlr3xtut/index.html
].  I've been using ANTLR for a little over a year, almost exclusively
by running the ANTLR tool from teh command line.  I'm just a CLI guy.
So I'm encountering questions with ANTLRworks perhaps later than I
should.

Now, here's my previous post, with new comments indicated in square brackets:

This question is so rudimentary that I am almost embarrassed to ask.
But since I almost never try to use ANTLRWorks for my parsers, I'll
risk injuring my pride in exchange for learning something.

If I paste the Expr.g *verbatim* from
http://www.antlr.org/works/help/tutorial/content/Expr.g into
ANTLRWorks 1.3.1 and feed it the following test input:

3+1
3-1

both run (via the Run menu) fine and produce the expected numerical
outputs.  But for the same test input, the ANTLRWorks interpreter
produces the expected parse tree for only 3+1 and gives a
MisMatchedTokenException on the '-' in 3-1.  If I reverse the '+' and
'-' alternatives in rule expr, the results are also reversed: it's the
second alternative that goes bad in the ANTLRWorks interpreter.

Thinking this might have something to do with the embedded actions
which the interpreter does not understand, I stripped them all out.
That leaves us with the following rule, for which the interpreter runs
without error on our test input:

expr
  :  multExpr ( ( '+' multExpr | '-' multExpr ) )*
  ;

[This is potentially ambiguous.  Does a token bind more tightly to
another token, or to the binary operator '|' for alternatives?  Yes,
we know the official ANTLR answer, but I'm questioning my
understanding of the specific implementation embodied in ANTLRworks.
See next rule.]

So I figured [maybe wrongly?] I was right about actions causing
problems.  But wait.  Let's dig deeper.  This second rule

expr
  :  multExpr ( ( '+' multExpr ) | ( '-' multExpr ) )*
  ;

works in the interpreter as expected for the first alternative (used
for 3+1) but produces a MisMatchedTokenException for the second
alternative (used for 3-1).

And better yet, this third rule

expr
  :  multExpr ( ( ( '+' multExpr ) | ( '-' multExpr ) ) )*
  ;

works great in the interpreter for both 3+1 and 3-1, just like the
first rule does.

All three rules actually run (from the Run menu) as expected.  Of
course, running them isn't very interesting with the actions stripped
out, but they do run without error.  So I suspect that they would all
produce equally viable parsers outside ANTLRWorks, but I have not
checked.  Have I stumbled onto an issue with the interpreter embedded
in ANTLRWorks, or have I done something silly? (Or both?)

Thanks [and Happy New Year],
Kyle