[antlr-interest] How to set imaginary token text?

Vaclav Barta vbar at comp.cz
Tue Jul 17 10:14:24 PDT 2007


Randall R Schulz wrote:
> On Monday 16 July 2007 22:13, Vaclav Barta wrote:
>> On Monday 16 July 2007 21:20, Randall R Schulz wrote:
>>> Let me clarify that it is at the lexical level that a
>>> token-per-character approach incurs potentially excessive overhead.
>>> For example, a whitespace rule that matched single white-space
>>> characters vs. one that collected them together could make a large
>>> difference in
>> Well, I'm not tokenizing whitespace characters individually. String
>> characters may well run into thousands, but what's a few thousand
>> objects between friends?
I take that back; I do, in fact, tokenize some whitespace characters
individually (the input format is something like a makefile, so
e.g. tabs are individual tokens) - just not all of them...

> You had a non-fragment lexer rule whose right-hand-side was a single dot 
> (any-character wildcard). This does indeed create a single token for 
> each character it matches. That was what prompted my original 
That's true, as far as it goes, but as I said, I define that token last, 
so it doesn't match very many characters - only unescaped characters in 
double-quoted strings which can't be in non-quoted strings, I think (and 
errors, of course).

	Bye
		Vasek




More information about the antlr-interest mailing list