[antlr-interest] the nihilistic circle hoist

Wed Jan 5 05:43:19 PST 2011

> That's not a bug but a limitation of Java and most other targets; I think the book has a good description.

If we're talking about bug#2 (skip ahead if not! :-), then...

After (re)reading the book, the only thing I
could figure out you might be referring to is the final
paragraphs of Chapter 13. But those are about the
issues pretty well covered by Sam's prior post of limitations.
The predicate "true" does not "reference anything not
visible to all rules" or violate any other constraint
I've heard of :-).

Maybe a more extreme example can make the
problem clear:

----------------------------------------------------------
// this grammar is for this simple language:
// '(' ID+ ')' ( ':' ID+ )?
grammar hoist1;

WS  : (' '|'\t'|'\n'|'\r')+ {skip();};
ID  : ('a'..'z')+;

start : mainprog optional? EOF;

mainprog : '(' args ')';
args : ID ID* ;

// I've decided to "disable" this optional construct
optional: {false}?=> ':' element element;
element : {true}?=> args ;
----------------------------------------------------------

* predicates completely safe (only true/false)
* compiles (target language has no complaints)
* works sometimes! (accepts string "( x )")
* fails other times.

(the {false} predicate is there just to make the
resulting output as disturbingly incorrect as
possible; you can remove it and still get the
same wrong output).

When given an input string of "( x x )" the
generated parser says:
   line 1:4 extraneous input 'x' expecting ')'
The reason is that the predicate is being
executed far outside its correct
context. Predicate hoisting is significantly
harder than the current ANTLR algorithm
thinks it is. :-)  I really don't see any way
to implement it correctly without either
nontrivial grammar transformations or
a runtime stack of active predicates.

But all that's about bug#2.

Back to bug#1, where hopefully it should be much
simpler for me to see if I'm confused.

> I get these results with 3.3.  Hmm... it looks like the decision for FOO FOO* and FOO+ also gets the same thing. which version are you using?

The header line of my ANTLR output says:
   // $ANTLR 3.3 Nov 30, 2010 12:50:56 Simple.g 2011-01-04 12:20:14

Maybe you're saying the DFA is the same so "no problem",
while I'm saying the two parsers accept different languages,
so "problem"?

Here's the exact file I just tested:

grammar hoist1;

WS  : (' '|'\t'|'\n'|'\r')+ {skip();};
ID  : ('a'..'z')+;

start : element* EOF ;
element : {true}?=> pre ;
pre : ID+ ;

When I give this grammar an input string of "x", I get:

line 1:0 required (...)+ loop did not match anything at input 'x'

When I change change the "ID+" to "ID ID*", the resulting
parser happily accepts any number of space-separated x's.
Is that not the same result you get?