[antlr-interest] Token stream filter

Thu Jun 3 03:26:32 PDT 2004

On Thu, Jun 03, 2004 at 09:58:27AM +0100, Anthony Youngman wrote:
> > If you write the code in nextToken to do that it will. The ! operator
> > controls treebuilding it's not the lexer's ! operator. At least I was under
> > the impression you wanted to use a parser to do the filtering not a lexer
> > in front of your original lexer.
>
> Bugger :-(

Life is never easy ;)

> Yes, I did want to use a parser between my original lexer and parser. Or
> can I put a lexer there instead? Basically, I don't care whether it's a
> lexer or parser, I just want to sit it between my primary lexer and
> parser to strip out stuff I don't want and/or modify stuff I do.

Well technically you could put a lexer there but it would probably teach
you more about antlr internals than you want to know ;) Seriously though,
you can probably do everything you want with an extra parser in between
stuff.

> Can I lex a token stream as well as a character stream? And if so, will
> the second lexer see hidden tokens (I presume not).

You'll be parsing the token stream and storing tokens in a list/queue while
you don't know what to do with them yet.

You get a setup where:

1. Your original parser asks the filter parser for a token. (by calling LA(x))
2. Your filter parser comes into nextToken
   1. sees wether it has tokens queued to pass on
   2. if not scarf tokens from the original lexer and see what should be
      done with them
   3. pass leftover bits to the calling parser

This is the setup that is explained in Monty's filter example. He's using a
queue to store tokens that are still considered to be passed on.

Consider the nextToken of his filter:

public Token nextToken() throws TokenStreamException
{
   Token ret;
   if (queue.length()<=0)
   {
      try
      {
         jumpStatements();
      }
      catch ( RecognitionException e) {;}
      catch ( TokenStreamException e) {;}
   }
   if (queue.length()>0)
   {
      ret = queue.elementAt(0);
      queue.removeFirst();
      return ret;
   }
   return new ArevToken(Token.EOF_TYPE,"");
}

jumpStatements in the above is a normal parser rule e.g. it get's it's
tokens from the original lexer. generally this will do a number of LA(x)
calls to get tokens these are checked with the match methods which again
call the consume method when the match is succesfull. This is the mechanism
you want to use.

I'm not 100% sure if it will work but you can probably add a store boolean
attribute to the filter and then:

Original filter rule:

commentst : (EOL | SEMI) ("*" | "!")! (~(EOL)*)! ;

New:

commentst : { store = true; } (EOL | SEMI) { store = false; }
            ("*" | "!")! (~(EOL)*)! { store = true } ;

Then in consume you'll only append tokens when store is true. You might
need a 'copy-the-rest' rule as well. Not sure since I'm only just now
tinkering with this stuff myself as well (which also explains my interest
in the topic...).

You can also handcode nextToken to parse'n'massage the incoming token
stream. Considering the complexity of the commentst rule this might be a
good option. Actually it's hardly worth using a parser for it. This might
come a long way just implement the tokenstream abstract interface and use
something like this for filter (this lacks exception handling):

public Token nextToken() throws TokenStreamException
{
	if( queue.size() <= 0 )
	{
		// look for tokens
		if ((LA(1) == EOL) ||(LA(1) == SEMI)) {
			queue.append(LT(1));
			origlexer.consume();

			if ((LA(1) == STAR) || (LA(1) == BANG)) {
				// don't queue
				origlexer.consume();
				while( LA(1) != EOL )
				{
					// don't queue
					origlexer.consume();
				}
			}
			else
			{
				queue.append(LT(1));
				origlexer.consume();
			}
		}
		else
		{
			queue.append(LT(1));
			origlexer.consume();
		}
	}
	if( queue.size() > 0 )
		... return front token of the queue ....
	else
		... return EOFTOKEN ...
}

Monty's original probably needs some tinkering with respect to
EOF/EOL/exception handling for your case.

Cheers,

Ric
--
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893755  ----
-----+++++*****************************************************+++++++++-------
     Innovation makes enemies of all those who prospered under the old
   regime, and only lukewarm support is forthcoming from those who would
               prosper under the new. --- Niccolò Machiavelli

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/