[antlr-interest] Sample scannerless parser with AST construction in unmodified ANTLR

Sun Apr 17 11:02:08 PDT 2011

You are correct that it would only work properly when it was backtracking. In fact, I was just thinking about this because I need to make some decisions about v4. Even if you turn on backtracking, there might be trouble. In the following case, the "and not" &! predicate inside kreturn is not needed:

foo	: kreturn semi 
	| id semi
	;
kreturn : 'r' 'e' 't' 'u' 'r' 'n' &!LetterOrDigit ws? ;

ANTLR detects the ambiguity and looks for the semi following id or kreturn. As you point out, though, this rule is a problem without the predicate

stat: kreturn e semi
    | id eq e semi
    | id colon
    ;
kreturn : 'r' 'e' 't' 'u' 'r' 'n' ws? ;

because it accepts "returnme;" (no space in between a keyword and the identifier following). ANTLR's DFA predicts the first alternative for "returnme" because it is the only thing that matches the input.  Because there is no ambiguity, turning on backtracking won't force it to backtrack across kreturn. The grammar clearly states that returnme; is legal. 

We would have to force backtracking if grammar analysis ever detected a & or &! predicate as it was creating the DFA. Actually, now that I think about it, perhaps it's time to consider hoisting predicates at look at depth greater than one. Errm, maybe not. Probably just these lookahead operators. For example,

a : 'a' &'x'
  | 'a' &'y'
  ;

ANTLR could create a DFA that tested lookahead

s0 -a-> s1
s1 -&x-> s2 (predict 1)
s1 -&y-> s2 (predict 2)

If the lookahead test  referred to a recursive rule or something, it would need to go off and test the syntactic predicate. This then causes problems for semantic predicates tested during that syntactic predicate because all of the lookahead indexes would be off. input.LA(1) used in the semantic predicate specified manually would not work anymore because the input would be shifted. grrr....

Maybe the simplest answer is to have a true PEG mode that always backtracks (perhaps with a simple LL(1) lookahead check to avoid always having to backtrack), then you can add in whatever predicates you want. If you want scannerless, maybe using pure PEG is the cost.

Ter

On Apr 17, 2011, at 10:25 AM, Peter Kooiman wrote:

>> Still, we could easily do it with a simple {...}? so ANTLR can still do it without formalism, 
>> just as PEGs force you to manually say that as well. :)
> 
> Are you thinking of a validating semantic predicate after matching the keyword? Not sure if that would work always, could that not give rise to a failed predicate error message when we really just want the parser to use the identifier rule instead? Or are you thinking of a semantic predicate checking look-ahead for keyword followed by not-letter not-digit?
> 
> Peter