[antlr-interest] Is it safe to overwrite the Lexer's text?

Tue Aug 28 06:23:12 PDT 2007

On 8/28/07, Bjoern Doebel <doebel at tudos.org> wrote:
> Gavin Lambert wrote:
> > At 00:11 29/08/2007, Bjoern Doebel wrote:
> >>I have a language consisting of source code and comments. I have a
> >>lexer and parser that create an AST of it. I would like to use the
> >>created syntax tree as a starting point for several tools. Some of
> >>them will need access to the comments, some of them won't. I
> >>cannot just ignore the comments, but I also don't want to store
> >>all of them inside the AST. Instead, inside the lexer I recognize
> >>comments, copy them (current value of self.text) into a separate
> >>table and replace the comment's text by overwriting self.text with
> >>an index referencing the comment's table entry.
> >
> > Isn't that what channels are for?
>
> Maybe. TDAR isn't very verbose on this topic and only mentions hidden
> channels for whitespaces. From my reading I understood that parsers can
> only read from exactly one channel (TDAR, p. 25). However, for my purposes
> I would need only the non-comment channel in some cases and both channels
> in other cases. Is there any documenation on how to do this with channels?
>
I think the current channel handling is something of a work in
progress. Currently the channel support is rather basic.
CommonTokenStream only allows a parser to read from one channel and
there doesn't appear to be any direct way to access off-channel
tokens.
You can override the channel for given token types, it looks like this
is primarily there to support the interpreter but you could use this
to put comment tokens back onto the default channel when you want to
process them. Or you could implement a subclass that processed
multiple channels rather than just one.
Though unless comment usage is rather restricted in your language you
probably want to keep them off-channel to avoid having to handle them
in the grammar which in a language where comments are allowed anywhere
would mean having "comment*" between all tokens in your grammar. You
can use CommonTokenStream.getTokens to get all tokens, possibly
restricted by token type, between given token indexes. Or you may want
to subclass CommonTokenStream and have a method to get all tokens on a
given channel between given indexes.

Tom.

> Bjoern
>