[antlr-interest] Antlr Bug: Failed semantic predicate in lexer triggers endless loop

Ron Hunter-Duvar ron.hunter-duvar at oracle.com
Wed Feb 10 12:42:48 PST 2010


Jim,

I showed exactly which code causes the endless loop. It's not generated 
code, and there's nothing to prove. Look at line 99 of 
org.antlr.runtime.Lexer.java (in the 3.2 code), near the end of the 
nextToken() method, the second catch clause. It catches the 
RecognitionException, does nothing but report the error, then it loops 
around and tries the match at the same point in the input. Every 
RecognitionException in any Java lexer which is not caught and handled 
by custom code will trigger an endless loop. This is a bug in the Antlr 
Java run-time code.

I proved this is the problem by adding a "recover(re);" call at that 
point in the Lexer.nextToken() method, which stopped the endless 
looping, allowing the parse to continue and then report failure due to 
the errors encountered. I can provide a patch file for the fix if you 
would like. And perhaps there's a better channel for making Antlr bug 
reports than this mailing list, but if so, I'm not sure what it is.

(And I don't want to get into a flame war here, but I have an M.Sc.(CS), 
I know what the Halting Problem is, I know how to prove that it can't be 
solved, and I also know that it has nothing to do with this discussion. 
I also have more than 20 years programming experience, much of this with 
parsing and language analysis tools. I'm relatively new to Antlr 3, but 
please don't treat me like an idiot, I do know what I'm doing.)

Ron


Jim Idle wrote:
> No it should not generate code like that if can avoid it, but it is difficult to detect that that is what will happen. How would you go about proving that the code will be an endless loop? http://en.wikipedia.org/wiki/Halting_problem 
>  
> Instead, the failed predicate exception at runtime shows that it must be your predicate that is going awry.
>  
> Jim
>  
> From: Ron Hunter-Duvar [mailto:ron.hunter-duvar at oracle.com] 
> Sent: Wednesday, February 10, 2010 11:53 AM
> To: Jim Idle
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Antlr Bug: Failed semantic predicate in lexer triggers endless loop
>  
> Yes, you're right that I didn't have the predicate coded properly in the rule. I haven't seen this limitation mentioned, but it seems that gated semantic predicates within subrules don't work in lexers, only in parsers. That is, they don't just turn off that subrule and make everything else still match. When I moved the predicate to the start of the lexer rule and made it a gated semantic predicate, it did what I wanted.
>
> But that's not an excuse for Antlr going into an endless loop (it's not my code that's looping, it's the Antlr run-time code itself that does the loop). Surely you're not going to tell me that this is correct run-time behaviour, that Antlr is supposed to go into an endless loop if I code a semantic predicate wrong?
>
> Ron
>
>
> Jim Idle wrote: 
> This just means that you haven't covered the predicated cases correctly. In general it means that you needed a gated predicate , not a simple ? predicate. Post the code for further help.
>  
> Basically your lexer should not throw any exceptions or reach unrecognizable input.
>  
> Jim
>  
>  
>   
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Ron Hunter-Duvar
> Sent: Wednesday, February 10, 2010 11:25 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Antlr Bug: Failed semantic predicate in lexer
> triggers endless loop
>  
> Hi,
>  
> I've run into something that is definitely a bug in Antlr's lexer code:
> if a semantic predicate fails within a lexer rule, it triggers an
> endless loop. The problem is in the Lexer.nextToken() method:
>  
>     public Token nextToken() {
>         while (true) {
>             state.token = null;
>             state.channel = Token.DEFAULT_CHANNEL;
>             state.tokenStartCharIndex = input.index();
>             state.tokenStartCharPositionInLine =
> input.getCharPositionInLine();
>             state.tokenStartLine = input.getLine();
>             state.text = null;
>             if ( input.LA(1)==CharStream.EOF ) {
>                 return Token.EOF_TOKEN;
>             }
>             try {
>                 mTokens();
>                 if ( state.token==null ) {
>                     emit();
>                 }
>                 else if ( state.token==Token.SKIP_TOKEN ) {
>                     continue;
>                 }
>                 return state.token;
>             }
>             catch (NoViableAltException nva) {
>                 reportError(nva);
>                 recover(nva); // throw out current char and try again
>             }
>             catch (RecognitionException re) {
>                 reportError(re);
>                 // match() routine has already called recover()
>             }
>         }
>     }
>  
> If a NoViableAltException is thrown, the recover method is called,
> which
> consumes one character and continues. But when a semantic predicate
> fails, it throws a FailedPredicateException, which is a subclass  of
> RecognitionException. As you can see in the code above, the exception
> is
> caught and reported, but it then loops around and tries matching again
> at the same point in the input, resulting in the same exception. Here's
> a sample of Antlr's output messages:
>  
> line 1:21 rule FLOAT failed predicate: { notIntFollowedByRangeOp() }?
> line 1:21 rule FLOAT failed predicate: { notIntFollowedByRangeOp() }?
> line 1:21 rule FLOAT failed predicate: { notIntFollowedByRangeOp() }?
> line 1:21 rule FLOAT failed predicate: { notIntFollowedByRangeOp() }?
> line 1:21 rule FLOAT failed predicate: { notIntFollowedByRangeOp() }?
> ...
>  
> I was able to work around this easily because I already had a custom
> lexer superclass, so I just pasted in that nextToken() code and added a
> "recover(re);" call to the second catch.
>  
> Ron
>  
> --
> Ron Hunter-Duvar | Software Developer V | 403-272-6580
> Oracle Service Engineering
> Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5
>  
> All opinions expressed here are mine, and do not necessarily represent
> those of my employer.
>  
>  
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address
>     
>  
>  
>  
>  
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>  
>   
>
>
>
>   

-- 
Ron Hunter-Duvar | Software Developer V | 403-272-6580
Oracle Service Engineering
Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5

All opinions expressed here are mine, and do not necessarily represent
those of my employer.



More information about the antlr-interest mailing list