[antlr-interest] FailedPredicateException leads to infinite loop - bug in the Lexer?

Cliff Hudson cliff.s.hudson at gmail.com
Tue Mar 30 11:45:18 PDT 2010


It's certainly possible that I have a bug in my lexer, since it has been
*ages* since I did any language programming.  I posted on here a repro
scenario based on the sample XML lexer/parser provided on the ANTLR website.
 The issue is that with that parser, the following input causes an infinite
loop:

<Program><</Program>

The issue is that the <</ sequence leads to a problem where no predicates
will match, so you cannot recover.  The first < in the sequence enters tag
mode, and then all subsequent tokens are invalid in tag mode.  PCDATA
matches, but its predicate forbids it to run.  My question is then what is
the appropriate way to construct the lexer such that it will recover
gracefully from that invalid input and NOT go into the infinite loop state
caused by the thrown exception?


On Tue, Mar 30, 2010 at 11:34 AM, Jim Idle <jimi at temporal-wave.com> wrote:

> Actually, I did not dispute that the hang was a bug, but stated that it was
> really somewhat irrelevant because the bug is in your predicate
> specifications. Lexers should not really be throwing exceptions but should
> be coded to deal with any input in a controlled manner. Your code was looked
> at, but as there has been no release since you reported your bug, I am not
> sure what you expect just at the moment.
>
> Of course, while debugging, it would be better if the lexer did not go in
> to an endless loop; that is an oversight and we should do something about
> that (in the next release - but I have not recently even had time to raise
> the JIRA). But you should not be relying on exceptions and recovery in the
> lexer; especially not in something that is commercial. Fix your
> predicates/rules/etc and cater for the error cases; your error messages will
> be commensurately better and your users will thank you in kind.
>
> Jim
>
>
>
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Ron Hunter-Duvar
> > Sent: Tuesday, March 30, 2010 10:58 AM
> > To: Cliff Hudson
> > Cc: antlr-interest at antlr.org
> > Subject: Re: [antlr-interest] FailedPredicateException leads to
> > infinite loop - bug in the Lexer?
> >
> > The message title was: "Antlr Bug: Failed semantic predicate in lexer
> > triggers endless loop"
> >
> > Basically, the bug is in the nextToken() method in Antlr's Lexer class.
> > If a NoViableAlt exception is thrown, the method calls the recover
> > method, which consumes one character before trying again. But if any
> > other RecognitionException (including a FailedPredicateException) is
> > thrown, it doesn't call recover, it just reports the exception and
> > loops
> > back to try again at the same point, which of course results in the
> > same
> > exception, because nothing has changed. I just added a recover call to
> > the second catch block, and it solved the problem.
> >
> > As I mentioned, I did this as an overriding method in a custom
> > superclass. You could also fix it directly in the Antlr source, but
> > then
> > you'd have to rebuild Antlr.
> >
> > I'm not sure this ever got reported as an official bug. Jim Idle
> > disputed whether it was an Antlr bug. I sent proof, but not sure it was
> > even looked at. Unfortunately, because of the (quite reasonable) legal
> > agreement required to submit bug reports, and the fact that I'm doing
> > this as an employee, I can't submit bug reports or fixes until I get
> > approval to do so (seems ridiculous at first glance, but in today's
> > legal climate I can't take chances).
> >
> > Ron
> >
> >
> > Cliff Hudson wrote:
> > > I've been all over the archives, but perhaps my search terms were
> > > inadequate.  I'll look again with that date in mind.  Thanks.
> > >
> > > On Tue, Mar 30, 2010 at 8:11 AM, Ron Hunter-Duvar
> > > <ron.hunter-duvar at oracle.com <mailto:ron.hunter-duvar at oracle.com>>
> > wrote:
> > >
> > >     Hi Cliff,
> > >
> > >     I reported this same problem on February 10 on this list. It's an
> > >     Antlr bug, and my emails on it had the work around (which
> > requires
> > >     you to implement a custom superclass if you haven't already). If
> > >     you search the list archives you should be able to find it.
> > >
> > >     Ron
> > >
> > >
> > >     Cliff Hudson wrote:
> > >
> > >         I have been trying to work through an issue with an infinite
> > >         loop caused
> > >         when no tokens can be matched because a predicate has failed
> > >         its test.  The
> > >         problem appears to be in the Lexer.NextToken() (looking at
> > >         CSharp2 sources)
> > >         method, which I have copied here for reference:
> > >
> > >                        /// <summary>
> > >                        /// Return a token from this source; i.e.,
> > >         Match a token on the char stream.
> > >                        /// </summary>
> > >                        public virtual IToken NextToken()
> > >                        {
> > >                                while (true)
> > >                                {
> > >                                        state.token = null;
> > >                                        state.channel =
> > >         Token.DEFAULT_CHANNEL;
> > >                                        state.tokenStartCharIndex =
> > >         input.Index;
> > >
> > >          state.tokenStartCharPositionInLine =
> > input.CharPositionInLine;
> > >                                        state.tokenStartLine =
> > input.Line;
> > >                                        state.text = null;
> > >                                        if (input.LA(1) ==
> > >         (int)CharStreamConstants.EOF)
> > >                                        {
> > >                                                return
> > Token.EOF_TOKEN;
> > >                                        }
> > >                                        try
> > >                                        {
> > >                                                mTokens();
> > >                                                if (state.token ==
> > null)
> > >                                                {
> > >                                                        Emit();
> > >                                                }
> > >                                                else if (state.token
> > ==
> > >         Token.SKIP_TOKEN)
> > >                                                {
> > >                                                        continue;
> > >                                                }
> > >                                                return state.token;
> > >                                        }
> > >                                        catch (NoViableAltException
> > nva) {
> > >                                                ReportError(nva);
> > >                                                Recover(nva); // throw
> > >         out current char and try again
> > >                                        }
> > >                                        catch (RecognitionException
> > re) {
> > >                                                ReportError(re);
> > >                                                // Match() routine has
> > >         already called Recover()
> > >                                        }
> > >                                }
> > >                        }
> > >
> > >         Note the RecognitionException clause.  This is the clause
> > >         which will
> > >         catch the FailedPredicateException().  Unfortunately, because
> > the
> > >         FailedPredicateException gets thrown just before Match()
> > >         occurs in the
> > >         rule, Recover will *not* have been called by the rule or its
> > >         callees,
> > >         and therefore the DFA will continue to try processing the
> > same
> > >         token.
> > >         It would appear that there should instead be a specific
> > >         FailedPredicateException clause which does the same thing as
> > the
> > >         NoViableAltException clause.
> > >
> > >         I have seen two other people ask questions about this error,
> > >         and in
> > >         neither case was a suitable response given.  Has this bug
> > been
> > >         fixed
> > >         in non-released builds?  Can someone give me an up-or-down on
> > >         whether
> > >         this is a correct and appropriate fix?
> > >
> > >         Thanks.
> > >
> > >         - Cliff
> > >
> > >         List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > >         Unsubscribe:
> > >         http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
> > >
> > >
> > >
> > >
> > >     --
> > >     Ron Hunter-Duvar | Software Developer V | 403-272-6580
> > >     Oracle Service Engineering
> > >     Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P
> > 3C5
> > >
> > >     All opinions expressed here are mine, and do not necessarily
> > represent
> > >     those of my employer.
> > >
> > >
> >
> > --
> > Ron Hunter-Duvar | Software Developer V | 403-272-6580
> > Oracle Service Engineering
> > Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5
> >
> > All opinions expressed here are mine, and do not necessarily represent
> > those of my employer.
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list