[antlr-interest] Help with discarding lexer tokens....
David-Sarah Hopwood
david-sarah at jacaranda.org
Thu Jul 23 14:39:54 PDT 2009
David-Sarah Hopwood wrote:
> Fredrik Ohrstrom wrote:
>> I would like to ignore certain tokens found at the lexer level.
>> For example: my example source code is sprinkled with tokens
>> like [remove] and I want to prevent these to be seen by the parser.
> [...]
>> I did finally stumble upon a solution, but it is ugly.
>>
>> grammar Test;
>> cmd : (CMD suffix? )* ;
>> suffix : LB CMD RB ;
>
> suffix : LB c=CMD RB
> { if ($c != null && $c.text.equals("remove")) $channel = HIDDEN; } ;
Sorry, setting $channel only makes sense in a lexer rule, so this won't
work as written.
Rather than using the explicit test above, I think it is probably more
elegant to declare "[remove]" in the tokens block (which gives it precedence
over other rules that it would otherwise be ambiguous with), like this:
grammar Test;
tokens {
REMOVE: '[remove]';
}
// should probably rename this rule for clarity
cmd : (Cmd Suffix?)* ;
Remove : REMOVE { $channel = HIDDEN; } ;
Suffix : '[' CMD ']' ;
Cmd : CMD ;
fragment CMD : ('a'..'z')+ ;
WS : (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;
This allows whitespace between Cmd and Suffix, or between multiple
Suffixes. It wasn't clear from your original post whether you want
to allow whitespace there or not.
> Thanks! But as I wrote in the other email, suffix is
> unfortunately really complicated and occurs in
> several different places in the parser.
Given the correction above, is there still a problem?
Suffix could be made arbitrarily complicated and used in any number
of places.
--
David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com
More information about the antlr-interest
mailing list