[antlr-interest] misunderstanding channel HIDDEN

Wed Aug 26 11:47:00 PDT 2009

Your BLAH rule doesn't know that it can call UCODE between characters.
You want something like this.

startrule: blah;  /* Probably also want to include EOF here, otherwise
the parser will successfully run against "blahblah" */

blah: B L A H;
UCODE   : '\u0000'{ $channel = HIDDEN; };
B: 'b';
L: 'l';
A: 'a';
H: 'h';

You might be able to keep BLAH as a lexer rule, but I doubt that you can
generate a full token (UCODE) in the middle of another token (the blah).

This does mean that many of your basic "tokens" will actually be parser
rules, which probably has a negative impact on efficiency.  Assuming
your text is whitespace delimited, you'll also need extra bits in the
parser rules to ensure that you've reached the end of a "token".  In
all, it might be simpler to filter the input stream and strip out the ^@
where they're not meaningful.

Troy

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Ian Eyberg
> Sent: Wednesday, August 26, 2009 2:14 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] misunderstanding channel HIDDEN
> 
> Hi,
>   I think I'm misunderstanding the usage of $channel = HIDDEN
> or skip().
> 
> I have text that looks like:
> 
>   'b^@l^@a^@h^@'
> 
> (most of the time the text is simply 'blah')
> and then it should come out like this:
> 
>   'blah'
> 
> my relevant rules are:
> 
>   startrule : BLAH;
>   BLAH    : 'blah';
>   UCODE   : '\u0000'{ $channel = HIDDEN; };
> 
> I'm reading in through antlrinputstream as "UTF8" as I do
> want to support multi-byte chars and I have rules to help
> that such as:
> 
> UNICODE : ('\u00a0'..'\uffff');
> 
> What am I doing wrong here?
> 
> Thanks,
> Ian
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address