[antlr-interest] Emitting (additional) imaginary tokens in the C target

Thu Jun 14 03:22:57 PDT 2007

Much like the Python example on page 110 of the ANTLR book, I think I  
am in a situation where my lexer will have to emit additional  
imaginary tokens in order to help the parser. Would be easy but as  
the book explains, this requires you to emit multiple tokens per  
rule, and ANTLR is built with the assumption that you'll emit exactly  
one token at a time.

As noted in the book and in the ANTLR source code it "[c]urrently  
does not support multiple emits per nextToken invocation for  
efficiency reasons. Subclass and override this method and nextToken  
(to push tokens into a list and pull from that list rather than a  
single variable as this implementation does)."

The idea of "subclassing" in the C language target doesn't sound like  
it will be very much fun, although I am sure it is possible. Before I  
go down this path I wanted to ask if anyone is doing multiple token  
emission using the C target?

My current analysis suggests that I will have to do these things:

1. Override the emit() and emitNew() functions (defined in  
antlr3lexer.c) to push tokens onto a list (pANTLR3_LIST type) rather  
than store them in a single variable. Looking at the Java source code  
it looks like overriding one of the emit methods would be enough  
(because one calls the other), but in the C target case it looks like  
both would need to be overridden (Jim, I think you should probably  
change emit() to call emitNew() rather than doing "lexer->token =  
token;" for this very reason).

2. Override the nextToken() function (again defined in antlr3lexer.c)  
to pop a token off the list rather than look for it in a single  
variable. nextToken() would only actually call mTokens() when the  
list is empty.

Here "override" means write new functions (and stick them in my  
@lexer::members section, I guess) and at some point update the lexer  
to point to the new implementations (after antlr3LexerNew has been  
called); I guess I do this in the code where I instantiate my lexer:

pMyLexer = MyLexerNew(stream);
pMyLexer->pLexer->tokSource->emit = mySpecialEmitFunction;
pMyLexer->pLexer->tokSource->nextToken = mySpecialNextTokenFunction;

Anyway, before I go down this possibly painful track wanted to ask if  
anyone has done this before with the C target: don't want to have to  
re-invent the wheel...

Cheers,
Wincent