[antlr-interest] Source positions for imaginary tokens

Mike Lischke mike at lischke-online.de
Wed Sep 12 00:40:41 PDT 2012


Jim,

> It is not a bug with the C target, as I have explained on numerous
> occasions. The other targets rely on method signatures to select the
> correct re-write. This is not available in C.

Sorry, have never seen such an explanation with all my searches I have done already in this list. You know all the internals surely way better than I do, but what is specifically missing that you can't create a virtual token with info from another token? Making a construct like DUM[$lb] working doesn't sound very complicated.

> 
> However, the information is erroneous anyway. Look at the generated code
> and you will see that only root nodes are fixed up with positional info.
> 
> Finally, rewriting like that is very expensive. I don't recommend it
> anyway.

You are probably referring to the complete original example while I'm specifically after a simple way to change properties of a token (especially when it can be written target independently). A good example is the list of keywords, which must sometimes be interpreted as normal identifiers, so what would be simple is something like:

keywords:
	(
	kw = KEYWORD1
	| kw = KEYWORD2
	...
	)
	-> ID[$kw]
;

There's no separate info necessary I'd say, everything is there, but still, the C target produces incorrect code (using kw like a string IIRC).

So what I do now (as I really need this) is:

keywords:
	KEYWORD1
	| KEYWORD2
	...
;
finally
{
	retval.start->setType(retval.start, IDENTIFIER);
}

which is rather a hack IMO, but the simplest solution I could come up with. I'm all ears for better solutions, if there's any.

Btw. when a feature really cannot be implemented in the C target, wouldn't it be better to write out some error message that the compiler complains about, so the grammar developer knows he cannot use this feature, instead letting him believe all is fine? Otherwise he's condemned to debug the grammar until he finds out the produced code is wrong (which can take quite some time when working with big grammars where loading the parser into the editor can easily take 20-30 secs).

Mike
-- 
www.soft-gems.net




More information about the antlr-interest mailing list