[antlr-interest] Emitting (additional) imaginary tokens in the C target
Wincent Colaiuta
win at wincent.com
Thu Jun 14 03:22:57 PDT 2007
Much like the Python example on page 110 of the ANTLR book, I think I
am in a situation where my lexer will have to emit additional
imaginary tokens in order to help the parser. Would be easy but as
the book explains, this requires you to emit multiple tokens per
rule, and ANTLR is built with the assumption that you'll emit exactly
one token at a time.
As noted in the book and in the ANTLR source code it "[c]urrently
does not support multiple emits per nextToken invocation for
efficiency reasons. Subclass and override this method and nextToken
(to push tokens into a list and pull from that list rather than a
single variable as this implementation does)."
The idea of "subclassing" in the C language target doesn't sound like
it will be very much fun, although I am sure it is possible. Before I
go down this path I wanted to ask if anyone is doing multiple token
emission using the C target?
My current analysis suggests that I will have to do these things:
1. Override the emit() and emitNew() functions (defined in
antlr3lexer.c) to push tokens onto a list (pANTLR3_LIST type) rather
than store them in a single variable. Looking at the Java source code
it looks like overriding one of the emit methods would be enough
(because one calls the other), but in the C target case it looks like
both would need to be overridden (Jim, I think you should probably
change emit() to call emitNew() rather than doing "lexer->token =
token;" for this very reason).
2. Override the nextToken() function (again defined in antlr3lexer.c)
to pop a token off the list rather than look for it in a single
variable. nextToken() would only actually call mTokens() when the
list is empty.
Here "override" means write new functions (and stick them in my
@lexer::members section, I guess) and at some point update the lexer
to point to the new implementations (after antlr3LexerNew has been
called); I guess I do this in the code where I instantiate my lexer:
pMyLexer = MyLexerNew(stream);
pMyLexer->pLexer->tokSource->emit = mySpecialEmitFunction;
pMyLexer->pLexer->tokSource->nextToken = mySpecialNextTokenFunction;
Anyway, before I go down this possibly painful track wanted to ask if
anyone has done this before with the C target: don't want to have to
re-invent the wheel...
Cheers,
Wincent
More information about the antlr-interest
mailing list