[antlr-interest] Emitting (additional) imaginary tokens in the Ctarget

Thu Jun 14 07:15:26 PDT 2007

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-

> Much like the Python example on page 110 of the ANTLR book, I think I
> am in a situation where my lexer will have to emit additional
> imaginary tokens in order to help the parser. Would be easy but as
> the book explains, this requires you to emit multiple tokens per
> rule, and ANTLR is built with the assumption that you'll emit exactly
> one token at a time.
>
> The idea of "subclassing" in the C language target doesn't sound like
> it will be very much fun, although I am sure it is possible. Before I
> go down this path I wanted to ask if anyone is doing multiple token
> emission using the C target?

Well, first of all, I designed the C runtime so that the subclassing
thing was very easy, so all you have to do is write your own nextToken()
function and after the parser is created, install it and you are done,
you have worked out the code yourself.

> 
> My current analysis suggests that I will have to do these things:
> 
> 1. Override the emit() and emitNew() functions (defined in
> antlr3lexer.c) to push tokens onto a list (pANTLR3_LIST type) rather

> both would need to be overridden (Jim, I think you should probably
> change emit() to call emitNew() rather than doing "lexer->token =
> token;" for this very reason).

I will look again, but I don't think so. The lexer->token is only what
the rule sets up for picking up and adding to the token list - if you
need a new mechanism to emit multiple tokens, then you would not use
that at all anyway.

However, I would be surprised if you actually did need to do this. I am
not even sure that Ter did this on the Python example because it was the
only way to deal with the stupid indent (I have not really looked at
that problem), but what makes you think that you need to emit two tokens
from a single rule rather than have two rules?

Secondly, if it is just one rule, then you can probably just hijack the
code that picks this up in the first place an add it to the pre-existing
list anyway, then emit the second token as normal.

Jim