[antlr-interest] Antlr Bug: Failed semantic predicate in lexer triggers endless loop

Wed Feb 10 11:24:39 PST 2010

Hi,

I've run into something that is definitely a bug in Antlr's lexer code: 
if a semantic predicate fails within a lexer rule, it triggers an 
endless loop. The problem is in the Lexer.nextToken() method:

    public Token nextToken() {
        while (true) {
            state.token = null;
            state.channel = Token.DEFAULT_CHANNEL;
            state.tokenStartCharIndex = input.index();
            state.tokenStartCharPositionInLine = 
input.getCharPositionInLine();
            state.tokenStartLine = input.getLine();
            state.text = null;
            if ( input.LA(1)==CharStream.EOF ) {
                return Token.EOF_TOKEN;
            }
            try {
                mTokens();
                if ( state.token==null ) {
                    emit();
                }
                else if ( state.token==Token.SKIP_TOKEN ) {
                    continue;
                }
                return state.token;
            }
            catch (NoViableAltException nva) {
                reportError(nva);
                recover(nva); // throw out current char and try again
            }
            catch (RecognitionException re) {
                reportError(re);
                // match() routine has already called recover()
            }
        }
    }

If a NoViableAltException is thrown, the recover method is called, which 
consumes one character and continues. But when a semantic predicate 
fails, it throws a FailedPredicateException, which is a subclass  of 
RecognitionException. As you can see in the code above, the exception is 
caught and reported, but it then loops around and tries matching again 
at the same point in the input, resulting in the same exception. Here's 
a sample of Antlr's output messages:

line 1:21 rule FLOAT failed predicate: { notIntFollowedByRangeOp() }?
line 1:21 rule FLOAT failed predicate: { notIntFollowedByRangeOp() }?
line 1:21 rule FLOAT failed predicate: { notIntFollowedByRangeOp() }?
line 1:21 rule FLOAT failed predicate: { notIntFollowedByRangeOp() }?
line 1:21 rule FLOAT failed predicate: { notIntFollowedByRangeOp() }?
...

I was able to work around this easily because I already had a custom 
lexer superclass, so I just pasted in that nextToken() code and added a 
"recover(re);" call to the second catch.

Ron

-- 
Ron Hunter-Duvar | Software Developer V | 403-272-6580
Oracle Service Engineering
Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5

All opinions expressed here are mine, and do not necessarily represent
those of my employer.