[antlr-interest] How to set imaginary token text?
Vaclav Barta
vbar at comp.cz
Tue Jul 17 10:14:24 PDT 2007
Randall R Schulz wrote:
> On Monday 16 July 2007 22:13, Vaclav Barta wrote:
>> On Monday 16 July 2007 21:20, Randall R Schulz wrote:
>>> Let me clarify that it is at the lexical level that a
>>> token-per-character approach incurs potentially excessive overhead.
>>> For example, a whitespace rule that matched single white-space
>>> characters vs. one that collected them together could make a large
>>> difference in
>> Well, I'm not tokenizing whitespace characters individually. String
>> characters may well run into thousands, but what's a few thousand
>> objects between friends?
I take that back; I do, in fact, tokenize some whitespace characters
individually (the input format is something like a makefile, so
e.g. tabs are individual tokens) - just not all of them...
> You had a non-fragment lexer rule whose right-hand-side was a single dot
> (any-character wildcard). This does indeed create a single token for
> each character it matches. That was what prompted my original
That's true, as far as it goes, but as I said, I define that token last,
so it doesn't match very many characters - only unescaped characters in
double-quoted strings which can't be in non-quoted strings, I think (and
errors, of course).
Bye
Vasek
More information about the antlr-interest
mailing list