[antlr-interest] Listing order of NT alternatives on rhs of production appears to affect "accept/reject" of parser for fixed input.
Andreas Weigel
andreas.weigel at tu-harburg.de
Thu Nov 13 01:25:36 PST 2008
Hi again,
thanks for your elaborate reply. I'm not very long into languages,
grammars etc. and you surely made me more aware of the fact that having
to use backtracking is something one should definitely avoid (and some
other considerations when working with parser generators). Still, here
is an example grammar which does reproduce the issue Dejas came up with
originally. I'm aware that it is a very ugly grammar and that the issue
disappears as soon as you "leftfactor" id and opAppl into one rule, but
again - backtracking turned on - the parser /should/ be able to match a
valid input correctly imho, which it does not:
grammar parserambiguity;
options{backtrack=true; memoize=true;}
model:
def* EOF;
def:
ID '==' expr;
prefix:
(
ID ('(' expr (',' expr)* ')')? '!'
)+
;
id:
(prefix)? ID;
opAppl:
id
('(' expr (',' expr)* ')');
expr:
id | opAppl;
ID :
('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
WS :
(' '|'\t'|'\r'|'\n')+ {$channel=HIDDEN;}
;
I used the following input:
Foo == Bar(bleh)
Alice == Bob
This in my understanding is a valid input of two 'def's which can be
derived by the following:
model -> def def
-> id == expr def
-> id == opAppl def
-> id == id(expr) def
-> id == id(id) def
-> id == id(id) id == expr
-> id == id(id) id == id
Debugging this with Antlrworks 1.2.1 and watching the parse tree the
parser first tries to match the first expr to an id with prefix (not
very surprising) which is deemed to fail as the "!" is missing. Then it
backtracks and matches a single id. But instead of realizing that this
too, is a wrong decision it just exits the rules down the stack (or is
it "up" the stack? I'm no native speaker...) and instead of trying to
jump into oppAppl (which i would expect due to backtrack=true) it simply
sees the invalid "(" token and throws an MissingTokenException(at ( ).
Changing the order of the alternatives in expr to expr: opAppl | id the
same example will parse to the end without any problems, also
backtracking out of opAppl and recognize the lonely id token in the
second def (Alice == Bob). To my understanding this isn't the behaviour
one could expect, although I might have overlooked something (did I
mention that I'm not into grammars and languages for long? ;-)
Hope this is enough of a small example to reproduce the "bug" and if
not, that someone may point out the mistake in my reasoning.
Andreas
More information about the antlr-interest
mailing list