[antlr-interest] Help with discarding lexer tokens....

Fredrik Ohrstrom oehrstroem at gmail.com
Wed Jul 22 03:06:31 PDT 2009


I would like to ignore certain tokens found at the lexer level.
For example: my example source code is sprinkled with tokens
like [remove] and I want to prevent these to be seen by the parser.

This could easily be done with sed, but this is not an options since
I use TokenRewriteStream to reconstruct the full source with
modifications and the [remove] tokens must be there.

Unfortunately there is a lookahead conflict with other valid
sequences like [ret] and the following grammar does not work:

grammar Test;
cmd :  (CMD suffix? )* ;
suffix : '[' CMD ']'  ;
CMD : 'a'..'z'+ 	;
DISCARD : '[remove]' { $channel=HIDDEN; } ;
WS : (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;

Fire it up in antlrworks or use antlr-3.1.3 standalone with the Java
program supplied
at the end of the email. Try to parse:

alfa[x] beta[remove] gamma[ret]

[x] is properly parsed. [remove] is properly lexed and discarded.
But [ret] causes the lexer to enter DISCARD and fail. I have tried
k=4, syntactic predicates, semantic predicates to no avail.

Also, the failed lex messes up the token rewrite stream
so that it does not reconstruct the original stream as
[re is simply dropped, which results in broken source.

I did finally stumble upon a solution, but it is ugly.

grammar Test;
cmd	:  (CMD suffix? )* ;
suffix : LB CMD RB ;
CMD :	'a'..'z'+ ;
LB
    : '[' { if (input.LA(1)=='r' &&
                input.LA(2)=='e' &&
                input.LA(3)=='m' &&
                input.LA(4)=='o' &&
                input.LA(5)=='v' &&
                input.LA(6)=='e' &&
                input.LA(7)==']') {
                   match("remove]");
                   $channel=HIDDEN;
               }
          }
    ;
RB : ']' ;
WS :   (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;

Is there a correct way to do it?

Thanks!

Fredrik Öhrström

----------------------------------------------

import java.io.*;
import org.antlr.runtime.*;

public class Test
{
   public static void main(String... args)
   {
      try {
         StringBuffer b = new StringBuffer();
         String line = null;
         BufferedReader input = new BufferedReader(new FileReader(args[0]));
         for (;;) {
            line = input.readLine();
            if (line==null) break;
            b.append(line);
         }
         String d = b.toString();
         System.out.println("Parsing\n>"+d+"<\n");
         CharStream cs = new ANTLRStringStream(d);
         TestLexer lexer = new TestLexer(cs);
         TokenRewriteStream rew = new TokenRewriteStream(lexer);
         TestParser parser = new TestParser(rew);
         parser.cmd();

         System.out.println("Done parsing\n>"+rew.toString()+"<\n");
      } catch (Exception e) {
         e.printStackTrace();
      }
   }
}


More information about the antlr-interest mailing list