[antlr-interest] Listing order of NT alternatives on rhs of production appears to affect "accept/reject" of parser for fixed input.

Dejas Ninethousand dejas9000 at gmail.com
Thu Nov 13 08:53:57 PST 2008


Thank you everyone for the response.  Please let me clarify one issue based
on poor word choice on my part.  I wrote:

I believe I have found a peculiar issue in ANTLR.  If memory serves, the
order of alternatives in a grammar should have no effect on the set of
inputs it accepts.  For example I believe:

program : statement_list | expression

is equivalent to:

program : expression | statement_list

I concede that these are not "equivalent" because, as many of you pointed
out, the specific syntax tree generated for some input program can be
different based on ordering of the non terminals.  I completely agree.  What
I meant to say was that changing the ordering of the non terminals should
not affect whether a specific input program to the parser is accepted or
rejected as a whole, irrespective of what the produced syntax tree is.  This
is the issue I am experiencing.  In one ordering of the non terminals in my
grammar the parser accepts my input "DOG CAT x ZEBRA x > x" and in another
ordering the parser throws a RecognitionException.  As a summary of the
issue here is the input, the reordered terminals, and the whole grammar:

Input:

DOG CAT x ZEBRA x > x

Varations of the Grammar:

a : b | c  vs. a : c | b ;

Whole Grammar:

program    : a EOF ;

a : b | c ; // swap b and c to change parser behavior

b : f |    d ;

c : e ;

d : D_NAME idl=f? D_TARGET a (resOp=RES res=a)? -> D_NAME $idl? D_TARGET a
$resOp? $res?;

e : b RCHEVRON b;

f : VERBATUM_IDENTIFIER;

D_NAME
    :    'DOG';

D_TARGET
    :    'CAT';

RES
    :    'ZEBRA';

RCHEVRON
    :    '>';

VERBATUM_IDENTIFIER
    : ('a'..'z'|'A'..'Z')(('a'..'z'|'A'..'Z')|('_')|('0'..'9'))*
    ;

WHITESPACE
    : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+     { $channel = HIDDEN; } ;


On Thu, Nov 13, 2008 at 3:25 AM, Andreas Weigel <
andreas.weigel at tu-harburg.de> wrote:

> Hi again,
>
> thanks for your elaborate reply. I'm not very long into languages,
> grammars etc. and you surely made me more aware of the fact that having
> to use backtracking is something one should definitely avoid (and some
> other considerations when working with parser generators). Still, here
> is an example grammar which does reproduce the issue Dejas came up with
> originally. I'm aware that it is a very ugly grammar and that the issue
> disappears as soon as you "leftfactor" id and opAppl into one rule, but
> again - backtracking turned on - the parser /should/ be able to match a
> valid input correctly imho, which it does not:
>
> grammar parserambiguity;
>  options{backtrack=true; memoize=true;}
>
> model:
>    def* EOF;
>
> def:
>    ID '==' expr;
>
> prefix:
>    (
>        ID ('(' expr (',' expr)* ')')? '!'
>    )+
> ;
>
> id:
>    (prefix)? ID;
>
> opAppl:
>    id
>    ('(' expr (',' expr)* ')');
>
> expr:
>    id | opAppl;
>
>
> ID :
>     ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
>
> ;
>
> WS :
>     (' '|'\t'|'\r'|'\n')+ {$channel=HIDDEN;}
> ;
>
> I used the following input:
> Foo == Bar(bleh)
> Alice == Bob
>
> This in my understanding is a valid input of two 'def's which can be
> derived by the following:
> model -> def def
> -> id == expr def
> -> id == opAppl def
> -> id == id(expr) def
> -> id == id(id) def
> -> id == id(id) id == expr
> -> id == id(id) id == id
>
> Debugging this with Antlrworks 1.2.1 and watching the parse tree the
> parser first tries to match the first expr to an id with prefix (not
> very surprising) which is deemed to fail as the "!" is missing. Then it
> backtracks and matches a single id. But instead of realizing that this
> too, is a wrong decision it just exits the rules down the stack (or is
> it "up" the stack? I'm no native speaker...) and instead of trying to
> jump into oppAppl (which i would expect due to backtrack=true) it simply
> sees the invalid "(" token and throws an MissingTokenException(at ( ).
>
> Changing the order of the alternatives in expr to expr: opAppl | id the
> same example will parse to the end without any problems, also
> backtracking out of opAppl and recognize the lonely id token in the
> second def (Alice == Bob). To my understanding this isn't the behaviour
> one could expect, although I might have overlooked something (did I
> mention that I'm not into grammars and languages for long? ;-)
>
> Hope this is enough of a small example to reproduce the "bug" and if
> not, that someone may point out the mistake in my reasoning.
>
> Andreas
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081113/6ea9e511/attachment.html 


More information about the antlr-interest mailing list