[antlr-interest] Listing order of NT alternatives on rhs of production appears to affect "accept/reject" of parser for fixed input.

Andreas Weigel andreas.weigel at tu-harburg.de
Thu Nov 13 01:25:36 PST 2008


Hi again,

thanks for your elaborate reply. I'm not very long into languages, 
grammars etc. and you surely made me more aware of the fact that having 
to use backtracking is something one should definitely avoid (and some 
other considerations when working with parser generators). Still, here 
is an example grammar which does reproduce the issue Dejas came up with 
originally. I'm aware that it is a very ugly grammar and that the issue 
disappears as soon as you "leftfactor" id and opAppl into one rule, but 
again - backtracking turned on - the parser /should/ be able to match a 
valid input correctly imho, which it does not:

grammar parserambiguity;
 options{backtrack=true; memoize=true;}

model:
    def* EOF;

def:
    ID '==' expr;

prefix:
    (
        ID ('(' expr (',' expr)* ')')? '!'
    )+
;

id:
    (prefix)? ID;

opAppl:   
    id
    ('(' expr (',' expr)* ')');

expr:
    id | opAppl;


ID :
     ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
     
;

WS :
     (' '|'\t'|'\r'|'\n')+ {$channel=HIDDEN;}
;

I used the following input:
Foo == Bar(bleh)
Alice == Bob

This in my understanding is a valid input of two 'def's which can be 
derived by the following:
model -> def def
-> id == expr def
-> id == opAppl def
-> id == id(expr) def
-> id == id(id) def
-> id == id(id) id == expr
-> id == id(id) id == id

Debugging this with Antlrworks 1.2.1 and watching the parse tree the 
parser first tries to match the first expr to an id with prefix (not 
very surprising) which is deemed to fail as the "!" is missing. Then it 
backtracks and matches a single id. But instead of realizing that this 
too, is a wrong decision it just exits the rules down the stack (or is 
it "up" the stack? I'm no native speaker...) and instead of trying to 
jump into oppAppl (which i would expect due to backtrack=true) it simply 
sees the invalid "(" token and throws an MissingTokenException(at ( ).

Changing the order of the alternatives in expr to expr: opAppl | id the 
same example will parse to the end without any problems, also 
backtracking out of opAppl and recognize the lonely id token in the 
second def (Alice == Bob). To my understanding this isn't the behaviour 
one could expect, although I might have overlooked something (did I 
mention that I'm not into grammars and languages for long? ;-)

Hope this is enough of a small example to reproduce the "bug" and if 
not, that someone may point out the mistake in my reasoning.

Andreas



More information about the antlr-interest mailing list