[antlr-interest] Re: Reusing Lexer/Parser instances

Sat Jul 12 18:44:49 PDT 2003

I had a requirement that I had to recognize identifiers ("unquoted 
names" below) quickly. After profiling to find the performance problem 
in my application, I ended up writing this in my lexer (SmnLexer.g) so 
that I could reuse a lexer instance. It isn't the most elegant but it 
works in my application:

private static class StringMatchBuffer extends antlr.InputBuffer {
StringMatchBuffer() { super(); }
    public String text;

    /** Ensure that the input buffer is sufficiently full */
    public void fill(int amount) { syncConsume(); }

    public String getLAChars() {
       return text.substring(markerOffset + numToConsume);
    }

    public String getMarkedChars() {
         throw new UnsupportedOperationException();
    }

    /** Get a lookahead character */
    public char LA(int i) throws antlr.CharStreamException {
        int index = markerOffset + numToConsume + i - 1;
        return index >= text.length() ? EOF_CHAR : text.charAt(index);
    }

    /** Sync up deferred consumption */
    protected void syncConsume() {
        markerOffset += numToConsume;
        numToConsume = 0;
    }
}

private final static StringMatchBuffer stringMatchBuffer
      = new StringMatchBuffer();
private final static SmnLexer stringMatchLexer
      = new SmnLexer(new antlr.LexerSharedInputState(stringMatchBuffer));
private int rememberTokenType;

public static synchronized boolean matchesUnquotedName(String s) {
     try {
         stringMatchBuffer.text = s;
         stringMatchLexer.inputState.reset();
         stringMatchLexer.rememberTokenType = Token.INVALID_TYPE;
         stringMatchLexer.resetText();
         stringMatchLexer.mUNQUOTED_NAME(false);
         return stringMatchLexer.rememberTokenType
                ==  SmnLexerTokenTypes.UNQUOTED_NAME
            && stringMatchLexer.LA(1) == EOF_CHAR;
     } catch (ANTLRException e) {
         return false;
     }
}

Lexer rule:

UNQUOTED_NAME
     : ( ('A'..'Z' | 'a'..'z') ('A'..'Z' | 'a'..'z' | '0'..'9' | '_' )* )
     {   $setType(testLiteralsTable(_ttype));
         rememberTokenType = _ttype;
     }
     ;

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/