[antlr-interest] Using a Parser as a TokenFilter

Wed May 11 11:06:50 PDT 2005

I'm trying to accomplish some token filtering by following Monty's 
document http://www.codetransform.com/filterexample.html but am running 
into a few problems. This is related to the previous thread about 
detecting transitions in stanza-based files. Initially I thought the 
TokenStreamRewriteFilter would be a good base class, but it turns out 
that it doesn't implement TokenStream and is instead for spitting actual 
text back out.

The goal is to insert imaginary tokens to help the downline parser in 
some cases (when I see short lines) and remove some tokens in other 
cases. I thought this would be relatively easy but I think I'm missing 
something.

To start I'm just trying to do the killing of extra commas at the end of 
the line thing, I have something like what is at the end of this 
message. Not only does this give me a stack overflow error when it 
actually does encounter extra commas, but it also seems to cause an 
"unexpected token: null" error in the downline parser in other cases, 
even after adding an EOF at the end of the main rule. After 
building/running with -trace, I think this may have something to do with 
the lookahead being filled with nulls.

At this point I feel like I'm missing something fundamental because I've 
been trying to get this filter idea to work for hours. Does anyone have 
a working example or any pointers?

Chris

PS Sorry for the mess, I had tried to have a special endOfLine rule but 
I don't think that will work since there would be nondeterminism and no 
way to detect the transition from line body to the last three tokens 
that I know of. In regex I could anchor this to the end of the line. I 
looked at mark/rewind but haven't figured out the proper way to do this. 
But at this point that is the least of my problems :)

// filter to change lines like "foo,bar,baz,,,,,,,," into "foo,bar,baz,"
    public void consume() {
        try {
          if(LA(1) == DELIM && LA(2) == DELIM && LA(3) == DELIM) {
              //System.out.println("skipping extra commas");
              //System.out.flush();
              queue.append(LT(1)); consumeUntil(NEWLINE);
          } else {
              queue.append(LT(1));
          }
          super.consume();
        } catch(TokenStreamException e) {
            System.err.println("error in consume");
            System.err.println(e);
            e.printStackTrace();
        }
    }

    public Token nextToken() throws TokenStreamException {
        Token ret;
        if(queue.length() <= 0) {
            try {
                line();
            } catch(RecognitionException e) { ; }
            catch(TokenStreamException e) { ; }
        }
        if(queue.length() > 0) {
            ret = queue.elementAt(0);
            queue.removeFirst();
            return ret;
        }
        System.out.println("no more queue, returning EOF");
        return new Token(Token.EOF_TYPE,"");
    }
}

line:
    (NEWLINE) => emptyLine
    | ((FIELD | DELIM)+ NEWLINE) => contentLine
    ;

emptyLine: NEWLINE ;

contentLine: (FIELD | DELIM)+ NEWLINE ;
//contentLine: (FIELD | DELIM)+ endOfLine ;

endOfLine:
    (DELIM FIELD NEWLINE) => fieldNewlineEOL
    | (FIELD DELIM NEWLINE) => fieldDelimNewlineEOL
    | (DELIM DELIM NEWLINE) => extraDelimsEOL
    ;

fieldNewlineEOL: DELIM FIELD NEWLINE ;

fieldDelimNewlineEOL: FIELD DELIM NEWLINE ;

extraDelimsEOL: DELIM (DELIM!)+ NEWLINE ;