[antlr-interest] Help with discarding lexer tokens....

Thu Jul 23 14:39:54 PDT 2009

David-Sarah Hopwood wrote:
> Fredrik Ohrstrom wrote:
>> I would like to ignore certain tokens found at the lexer level.
>> For example: my example source code is sprinkled with tokens
>> like [remove] and I want to prevent these to be seen by the parser.
> [...]
>> I did finally stumble upon a solution, but it is ugly.
>>
>> grammar Test;
>> cmd	:  (CMD suffix? )* ;
>> suffix : LB CMD RB ;
> 
>   suffix : LB c=CMD RB
>     { if ($c != null && $c.text.equals("remove")) $channel = HIDDEN; } ;

Sorry, setting $channel only makes sense in a lexer rule, so this won't
work as written.

Rather than using the explicit test above, I think it is probably more
elegant to declare "[remove]" in the tokens block (which gives it precedence
over other rules that it would otherwise be ambiguous with), like this:

  grammar Test;

  tokens {
    REMOVE: '[remove]';
  }

  // should probably rename this rule for clarity
  cmd : (Cmd Suffix?)* ;

  Remove : REMOVE { $channel = HIDDEN; } ;

  Suffix : '[' CMD ']' ;

  Cmd : CMD ;

  fragment CMD : ('a'..'z')+ ;

  WS : (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;

This allows whitespace between Cmd and Suffix, or between multiple
Suffixes. It wasn't clear from your original post whether you want
to allow whitespace there or not.

> Thanks! But as I wrote in the other email, suffix is
> unfortunately really complicated and occurs in
> several different places in the parser.

Given the correction above, is there still a problem?
Suffix could be made arbitrarily complicated and used in any number
of places.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com