[antlr-interest] Help with discarding lexer tokens....

Mon Jul 27 10:10:50 PDT 2009

I tried explicit tokens as well, but it still uses too little lookahead.
I did finally solve it in a reasonable way, i.e. the grammar
stays unchanged, and I can add any number of weird discard rules.
:-)

grammar Test;
cmd :  (CMD suffix? )* ;
suffix : '[' CMD ']'  ;
CMD : 'a'..'z'+ ;
DISCARD : { (input.LA(1)=='[' &&
                      input.LA(2)=='r' &&
                      input.LA(3)=='e' &&
                      input.LA(4)=='m &&
                      input.LA(5)=='o' &&
                      input.LA(6)=='v' &&
                      input.LA(7)=='e &&
                      input.LA(8)==']') } ?=> '[remove]' { $channel=HIDDEN; } ;
WS : (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;

Thanks!

2009/7/23 David-Sarah Hopwood <david-sarah at jacaranda.org>:
> David-Sarah Hopwood wrote:
>> Fredrik Ohrstrom wrote:
>>> I would like to ignore certain tokens found at the lexer level.
>>> For example: my example source code is sprinkled with tokens
>>> like [remove] and I want to prevent these to be seen by the parser.
>> [...]
>>> I did finally stumble upon a solution, but it is ugly.
>>>
>>> grammar Test;
>>> cmd  :  (CMD suffix? )* ;
>>> suffix : LB CMD RB ;
>>
>>   suffix : LB c=CMD RB
>>     { if ($c != null && $c.text.equals("remove")) $channel = HIDDEN; } ;
>
> Sorry, setting $channel only makes sense in a lexer rule, so this won't
> work as written.
>
> Rather than using the explicit test above, I think it is probably more
> elegant to declare "[remove]" in the tokens block (which gives it precedence
> over other rules that it would otherwise be ambiguous with), like this:
>
>  grammar Test;
>
>  tokens {
>    REMOVE: '[remove]';
>  }
>
>  // should probably rename this rule for clarity
>  cmd : (Cmd Suffix?)* ;
>
>  Remove : REMOVE { $channel = HIDDEN; } ;
>
>  Suffix : '[' CMD ']' ;
>
>  Cmd : CMD ;
>
>  fragment CMD : ('a'..'z')+ ;
>
>  WS : (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;
>
> This allows whitespace between Cmd and Suffix, or between multiple
> Suffixes. It wasn't clear from your original post whether you want
> to allow whitespace there or not.
>
>> Thanks! But as I wrote in the other email, suffix is
>> unfortunately really complicated and occurs in
>> several different places in the parser.
>
> Given the correction above, is there still a problem?
> Suffix could be made arbitrarily complicated and used in any number
> of places.
>
> --
> David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>