[antlr-interest] misunderstanding channel HIDDEN
Daniels, Troy (US SSA)
troy.daniels at baesystems.com
Wed Aug 26 11:47:00 PDT 2009
Your BLAH rule doesn't know that it can call UCODE between characters.
You want something like this.
startrule: blah; /* Probably also want to include EOF here, otherwise
the parser will successfully run against "blahblah" */
blah: B L A H;
UCODE : '\u0000'{ $channel = HIDDEN; };
B: 'b';
L: 'l';
A: 'a';
H: 'h';
You might be able to keep BLAH as a lexer rule, but I doubt that you can
generate a full token (UCODE) in the middle of another token (the blah).
This does mean that many of your basic "tokens" will actually be parser
rules, which probably has a negative impact on efficiency. Assuming
your text is whitespace delimited, you'll also need extra bits in the
parser rules to ensure that you've reached the end of a "token". In
all, it might be simpler to filter the input stream and strip out the ^@
where they're not meaningful.
Troy
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Ian Eyberg
> Sent: Wednesday, August 26, 2009 2:14 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] misunderstanding channel HIDDEN
>
> Hi,
> I think I'm misunderstanding the usage of $channel = HIDDEN
> or skip().
>
> I have text that looks like:
>
> 'b^@l^@a^@h^@'
>
> (most of the time the text is simply 'blah')
> and then it should come out like this:
>
> 'blah'
>
> my relevant rules are:
>
> startrule : BLAH;
> BLAH : 'blah';
> UCODE : '\u0000'{ $channel = HIDDEN; };
>
> I'm reading in through antlrinputstream as "UTF8" as I do
> want to support multi-byte chars and I have rules to help
> that such as:
>
> UNICODE : ('\u00a0'..'\uffff');
>
> What am I doing wrong here?
>
> Thanks,
> Ian
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address
More information about the antlr-interest
mailing list