[antlr-interest] the nihilistic circle hoist

Wed Jan 5 07:50:46 PST 2011

Here's my analysis of the problem. First and foremost, I'm treating the rule
"args" as though it were written "args : ID+;"

Problem 1: The rule "args" is not always executed behind the same predicate.
This is easily corrected by adding a second rule "args2" and modifying the
"element" rule as follows:

element : {true}? => args2;
args2 : ID+;

Problem 2: If we take a detailed look at the decision making for "optional",
we see a major problem. Removing predicates and inlining, we see the
following syntactic structure:

optional : ':' ID+ ID+;

After matching one ID element for the first instance of "ID+", if the next
input element is "ID", the grammar must decide whether to match that with
the first instance of "ID+" or to move on to the second. Without predicates,
this is a grammatical error. However, the predicates you provided clearly
indicate that on the condition "true", the loop should be terminated and
move on to the next instance of "ID+" (contained in the second "element").
This condition *must* be checked as part of the loop condition in the rule
"args".

Solution: Ignoring the only truly acceptable solution of not using syntactic
predicates, you should add the following to my list of rules about semantic
predicates.

1. A semantic predicate which unconditionally evaluates to "true" during
*any single execution of the grammar* behaves almost as though it weren't
included in the grammar. By removing the predicate from the rule "element",
the grammar compiler reports a warning for the rule "args" which clearly
indicates the rule "args" will not behave as you expect. This is the
specific rule where the decision error was observed - the semantic predicate
suppressed a critical warning.

Thanks,
Sam

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Ron Burk
Sent: Wednesday, January 05, 2011 7:43 AM
To: ANTLR Interest
Subject: Re: [antlr-interest] the nihilistic circle hoist

> That's not a bug but a limitation of Java and most other targets; I think
the book has a good description.

If we're talking about bug#2 (skip ahead if not! :-), then...

After (re)reading the book, the only thing I could figure out you might be
referring to is the final paragraphs of Chapter 13. But those are about the
issues pretty well covered by Sam's prior post of limitations.
The predicate "true" does not "reference anything not visible to all rules"
or violate any other constraint I've heard of :-).

Maybe a more extreme example can make the problem clear:

----------------------------------------------------------
// this grammar is for this simple language:
// '(' ID+ ')' ( ':' ID+ )?
grammar hoist1;

WS  : (' '|'\t'|'\n'|'\r')+ {skip();};
ID  : ('a'..'z')+;

start : mainprog optional? EOF;

mainprog : '(' args ')';
args : ID ID* ;

// I've decided to "disable" this optional construct
optional: {false}?=> ':' element element; element : {true}?=> args ;
----------------------------------------------------------

* predicates completely safe (only true/false)
* compiles (target language has no complaints)
* works sometimes! (accepts string "( x )")
* fails other times.

(the {false} predicate is there just to make the resulting output as
disturbingly incorrect as possible; you can remove it and still get the same
wrong output).

When given an input string of "( x x )" the generated parser says:
   line 1:4 extraneous input 'x' expecting ')'
The reason is that the predicate is being executed far outside its correct
context. Predicate hoisting is significantly harder than the current ANTLR
algorithm thinks it is. :-)  I really don't see any way to implement it
correctly without either nontrivial grammar transformations or a runtime
stack of active predicates.

But all that's about bug#2.

Back to bug#1, where hopefully it should be much simpler for me to see if
I'm confused.

> I get these results with 3.3.  Hmm... it looks like the decision for FOO
FOO* and FOO+ also gets the same thing. which version are you using?

The header line of my ANTLR output says:
   // $ANTLR 3.3 Nov 30, 2010 12:50:56 Simple.g 2011-01-04 12:20:14

Maybe you're saying the DFA is the same so "no problem", while I'm saying
the two parsers accept different languages, so "problem"?

Here's the exact file I just tested:

grammar hoist1;

WS  : (' '|'\t'|'\n'|'\r')+ {skip();};
ID  : ('a'..'z')+;

start : element* EOF ;
element : {true}?=> pre ;
pre : ID+ ;

When I give this grammar an input string of "x", I get:

line 1:0 required (...)+ loop did not match anything at input 'x'

When I change change the "ID+" to "ID ID*", the resulting parser happily
accepts any number of space-separated x's.
Is that not the same result you get?

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address