[antlr-interest] Using a Parser as a TokenFilter
Chris Black
chris at lotuscat.com
Wed May 11 11:06:50 PDT 2005
I'm trying to accomplish some token filtering by following Monty's
document http://www.codetransform.com/filterexample.html but am running
into a few problems. This is related to the previous thread about
detecting transitions in stanza-based files. Initially I thought the
TokenStreamRewriteFilter would be a good base class, but it turns out
that it doesn't implement TokenStream and is instead for spitting actual
text back out.
The goal is to insert imaginary tokens to help the downline parser in
some cases (when I see short lines) and remove some tokens in other
cases. I thought this would be relatively easy but I think I'm missing
something.
To start I'm just trying to do the killing of extra commas at the end of
the line thing, I have something like what is at the end of this
message. Not only does this give me a stack overflow error when it
actually does encounter extra commas, but it also seems to cause an
"unexpected token: null" error in the downline parser in other cases,
even after adding an EOF at the end of the main rule. After
building/running with -trace, I think this may have something to do with
the lookahead being filled with nulls.
At this point I feel like I'm missing something fundamental because I've
been trying to get this filter idea to work for hours. Does anyone have
a working example or any pointers?
Chris
PS Sorry for the mess, I had tried to have a special endOfLine rule but
I don't think that will work since there would be nondeterminism and no
way to detect the transition from line body to the last three tokens
that I know of. In regex I could anchor this to the end of the line. I
looked at mark/rewind but haven't figured out the proper way to do this.
But at this point that is the least of my problems :)
// filter to change lines like "foo,bar,baz,,,,,,,," into "foo,bar,baz,"
public void consume() {
try {
if(LA(1) == DELIM && LA(2) == DELIM && LA(3) == DELIM) {
//System.out.println("skipping extra commas");
//System.out.flush();
queue.append(LT(1)); consumeUntil(NEWLINE);
} else {
queue.append(LT(1));
}
super.consume();
} catch(TokenStreamException e) {
System.err.println("error in consume");
System.err.println(e);
e.printStackTrace();
}
}
public Token nextToken() throws TokenStreamException {
Token ret;
if(queue.length() <= 0) {
try {
line();
} catch(RecognitionException e) { ; }
catch(TokenStreamException e) { ; }
}
if(queue.length() > 0) {
ret = queue.elementAt(0);
queue.removeFirst();
return ret;
}
System.out.println("no more queue, returning EOF");
return new Token(Token.EOF_TYPE,"");
}
}
line:
(NEWLINE) => emptyLine
| ((FIELD | DELIM)+ NEWLINE) => contentLine
;
emptyLine: NEWLINE ;
contentLine: (FIELD | DELIM)+ NEWLINE ;
//contentLine: (FIELD | DELIM)+ endOfLine ;
endOfLine:
(DELIM FIELD NEWLINE) => fieldNewlineEOL
| (FIELD DELIM NEWLINE) => fieldDelimNewlineEOL
| (DELIM DELIM NEWLINE) => extraDelimsEOL
;
fieldNewlineEOL: DELIM FIELD NEWLINE ;
fieldDelimNewlineEOL: FIELD DELIM NEWLINE ;
extraDelimsEOL: DELIM (DELIM!)+ NEWLINE ;
More information about the antlr-interest
mailing list