[antlr-interest] Generating Fake Lexical Tokens

Wed Sep 25 04:23:53 PDT 2002

At 03:14 PM 18/09/2002 +0000, shyamgopale wrote:
>    Consider the Python program
>    if test:
>        print "something"
>        do_something()
>    # Outside if
>    do_somethingmore()
>Now for the above program - The lexer needs to generate
>an INDENT token before the print to let the parser
>know that the following statements are part of an
>if block. And similarly it needs to generate a DEDENT
>token after do_something() to indicate end of the if
>block.
>   I have the logic to generate the INDENT and DEDENT
>tokens but I have no idea how to make the lexer report
>them before or after the real tokens. Can anyone help
>me out with this. I am looking for a way to insert
>additional tokens in the token stream.

Just off the top of my head could you do want you want with a Token Stream 
class

ie something like:

public class IndentFilterStream implements TokenStream {

         protected TokenStream lexer = null;
         protected int level = 0;

         public IndentFilterStream(TokenStream in) {
                 lexer = in;
         }

         public Token nextToken() throws TokenStreamException {
                 Token t = lexer.nextToken();
                 if (t.getType() == NEWLINE) {
                         Token t2 = lexer.getToken();
                         if (t2.getType() == WHITESPACE) {
                                 int len = t2.getText().getLength();
                                 if (len > level) t2 = new Token(INDENT);
                                 if (len < level) t2 = new Token(DEDENT);
                                 level = len;
                         }
                         return 
t2;
                 }
                 return t;
         }
}

The above has some problems in that you would lose NEWLINE and some 
WHITESPACE tokens to the parser, but it is a start......

  - Robert

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/