[antlr-interest] antlr-interest Digest, Vol 27, Issue 48

Mon Feb 26 06:46:03 PST 2007

>> lexer grammar DUMMY_Lexer;
>> options { filter=true; }
>>
>> INT          : 'int' ;
>> SEMI         : ';' ;
>> WS           :  (  ' '| '\t'| '\r' | '\n' )+ {$channel=HIDDEN;} ;
>> IDENTIFIER   : ('a'..'z'|'A'..'Z'|'_')+;
>
>Why are you using the filter option? This option causes ANTLR to try the 
>tokens one-by-one. It continues at the next token if the current token 
>does not match. So on the input 'intt' it will match an INT token first, 
>followed by the IDENTIFIER 't'. When you remove the filter option, it 
>should match a single IDENTIFIER token.

I guess the real reason is I am lazy. I did not want to tokenize 
everything contained in the input (I could have used the skip feature - 
but I was too lazy for that too!).

I still don't understand why the lexer would break the token at a 
character identified in a rule the lexer can match, and what it has to 
do with the filter=true. Perhaps an example would help me get that.

Cheers,
Martin