[antlr-interest] Help with discarding lexer tokens....

Stanislav Sokorac sokorac at gmail.com
Wed Jul 22 17:34:14 PDT 2009


The way I'd do is to first make suffix a lexer rule, and then move it down
below the DISCARD rule so that it can still match:

grammar Test;
cmd :  (CMD SUFFIX? )* ;
CMD : 'a'..'z'+         ;
DISCARD : '[remove]' { $channel=HIDDEN; } ;
SUFFIX : '[' CMD ']'  ;
WS : (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;

This works for your example.

Stan

On Wed, Jul 22, 2009 at 6:06 AM, Fredrik Ohrstrom <oehrstroem at gmail.com>wrote:

> I would like to ignore certain tokens found at the lexer level.
> For example: my example source code is sprinkled with tokens
> like [remove] and I want to prevent these to be seen by the parser.
>
> This could easily be done with sed, but this is not an options since
> I use TokenRewriteStream to reconstruct the full source with
> modifications and the [remove] tokens must be there.
>
> Unfortunately there is a lookahead conflict with other valid
> sequences like [ret] and the following grammar does not work:
>
> grammar Test;
> cmd :  (CMD suffix? )* ;
> suffix : '[' CMD ']'  ;
> CMD : 'a'..'z'+         ;
> DISCARD : '[remove]' { $channel=HIDDEN; } ;
> WS : (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;
>
> Fire it up in antlrworks or use antlr-3.1.3 standalone with the Java
> program supplied
> at the end of the email. Try to parse:
>
> alfa[x] beta[remove] gamma[ret]
>
> [x] is properly parsed. [remove] is properly lexed and discarded.
> But [ret] causes the lexer to enter DISCARD and fail. I have tried
> k=4, syntactic predicates, semantic predicates to no avail.
>
> Also, the failed lex messes up the token rewrite stream
> so that it does not reconstruct the original stream as
> [re is simply dropped, which results in broken source.
>
> I did finally stumble upon a solution, but it is ugly.
>
> grammar Test;
> cmd     :  (CMD suffix? )* ;
> suffix : LB CMD RB ;
> CMD :   'a'..'z'+ ;
> LB
>    : '[' { if (input.LA(1)=='r' &&
>                input.LA(2)=='e' &&
>                input.LA(3)=='m' &&
>                input.LA(4)=='o' &&
>                input.LA(5)=='v' &&
>                input.LA(6)=='e' &&
>                input.LA(7)==']') {
>                   match("remove]");
>                   $channel=HIDDEN;
>               }
>          }
>    ;
> RB : ']' ;
> WS :   (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;
>
> Is there a correct way to do it?
>
> Thanks!
>
> Fredrik Öhrström
>
> ----------------------------------------------
>
> import java.io.*;
> import org.antlr.runtime.*;
>
> public class Test
> {
>   public static void main(String... args)
>   {
>      try {
>         StringBuffer b = new StringBuffer();
>         String line = null;
>         BufferedReader input = new BufferedReader(new FileReader(args[0]));
>         for (;;) {
>            line = input.readLine();
>            if (line==null) break;
>            b.append(line);
>         }
>         String d = b.toString();
>         System.out.println("Parsing\n>"+d+"<\n");
>         CharStream cs = new ANTLRStringStream(d);
>         TestLexer lexer = new TestLexer(cs);
>         TokenRewriteStream rew = new TokenRewriteStream(lexer);
>         TestParser parser = new TestParser(rew);
>         parser.cmd();
>
>         System.out.println("Done parsing\n>"+rew.toString()+"<\n");
>      } catch (Exception e) {
>         e.printStackTrace();
>      }
>   }
> }
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090722/ae02d24b/attachment.html 


More information about the antlr-interest mailing list