[antlr-interest] dellete space token
Gavin Lambert
antlr at mirality.co.nz
Tue Apr 1 01:41:28 PDT 2008
At 20:18 1/04/2008, elekis wrote:
>lexer grammar Lyaflplug;
>
>TAG_DEFINITION_MODULE :'DEFINITION MODULE';
>TAG_POINT_VIRGULE :';';
>Ident :LETTER NAMECHAR*;
>fragment NAMECHAR :LETTER | DIGIT | '-' | '_';
>fragment LETTER :'a'..'z'| 'A'..'Z';
>fragment DIGIT :'0'..'9';
>WS :(' '|'\r'|'\t'|'\u000C'|'\n') {channel=99;};
>
>based on the xml tutorial
>
>but when I print token I have that
>
>Token: DEFINITION MODULE
>Token:
>Token: helloworld
>Token:
>Token: ;
>
>
>he print all space. WHY??
When you say "channel=99" (and incidentally, this should probably
be "$channel=HIDDEN"), tokens are still generated, they're just
given a different channel id. If you're reading the output of the
lexer directly you'll still see all of them.
I don't recall whether it's the token stream or the parser itself
that filters down to a single channel, but either way the parser
will normally only see tokens on one specific channel, although
it's possible to locate nearby tokens on other channels in target
language code blocks (which can be useful for disambiguation).
On the other hand, if you don't want it to generate a token at
all, then you can call skip() instead.
>other thing is it possible to put directly token in the parser I
>mean write a rule like that
>
>compilationUnit: 'DEFINITION MODULE' Ident ';';
>
>than that
>
>compilationUnit: TAG_DEFINITION_MODULE Ident TAG_POINT_VIRGULE;
You can do this if you write a combined grammar ("grammar foo"
instead of "lexer grammar foo" and "parser grammar bar"), but
personally I think this just makes things more confusing. The
generated code is harder to read (because you've now got T12
instead of TAG_DEFINITION_MODULE, for example), and it's too easy
to forget the separation between lexing and parsing, and end up
with conflicting or ambiguous lexer rules.
More information about the antlr-interest
mailing list