[antlr-interest] dellete space token

Gavin Lambert antlr at mirality.co.nz
Tue Apr 1 01:41:28 PDT 2008


At 20:18 1/04/2008, elekis wrote:
 >lexer grammar Lyaflplug;
 >
 >TAG_DEFINITION_MODULE 		:'DEFINITION MODULE';
 >TAG_POINT_VIRGULE			:';';
 >Ident						:LETTER NAMECHAR*;
 >fragment NAMECHAR			:LETTER | DIGIT | '-' | '_';
 >fragment LETTER				:'a'..'z'| 'A'..'Z';
 >fragment DIGIT				:'0'..'9';
 >WS  			:(' '|'\r'|'\t'|'\u000C'|'\n') {channel=99;};
 >
 >based on the xml tutorial
 >
 >but when I print token I have that
 >
 >Token: DEFINITION MODULE
 >Token:
 >Token: helloworld
 >Token:
 >Token: ;
 >
 >
 >he print all space. WHY??

When you say "channel=99" (and incidentally, this should probably 
be "$channel=HIDDEN"), tokens are still generated, they're just 
given a different channel id.  If you're reading the output of the 
lexer directly you'll still see all of them.

I don't recall whether it's the token stream or the parser itself 
that filters down to a single channel, but either way the parser 
will normally only see tokens on one specific channel, although 
it's possible to locate nearby tokens on other channels in target 
language code blocks (which can be useful for disambiguation).

On the other hand, if you don't want it to generate a token at 
all, then you can call skip() instead.

 >other thing is it possible to put directly token in the parser I 

 >mean write a rule like that
 >
 >compilationUnit: 'DEFINITION MODULE' Ident ';';
 >
 >than that
 >
 >compilationUnit: TAG_DEFINITION_MODULE Ident TAG_POINT_VIRGULE;

You can do this if you write a combined grammar ("grammar foo" 
instead of "lexer grammar foo" and "parser grammar bar"), but 
personally I think this just makes things more confusing.  The 
generated code is harder to read (because you've now got T12 
instead of TAG_DEFINITION_MODULE, for example), and it's too easy 
to forget the separation between lexing and parsing, and end up 
with conflicting or ambiguous lexer rules.



More information about the antlr-interest mailing list