[antlr-interest] Source positions for imaginary tokens

Jim Idle jimi at temporal-wave.com
Wed Sep 12 13:18:26 PDT 2012


Sure – I can make it be either of those calls, but not both at once. I have
no context at code generation time that can tell me which one to generate.
If I change it to this, then all the people that want it to be the other
way, will claim that they have found a bug too. It only works in Java
because the Java compiler can see what the argument types are, and can
therefore call the “correct” method.



However, it is much simpler to just use code to operate on the token
directly. Even before that, you should consider whether you need to change
something about the token because a later stage MUST receive a different
token, or whether you just think that you WANT it to.



Jim



*From:* A Z [mailto:asicaddress at gmail.com]
*Sent:* Wednesday, September 12, 2012 8:30 AM
*To:* Mike Lischke
*Cc:* Jim Idle; antlr-interest at antlr.org
*Subject:* Re: [antlr-interest] Source positions for imaginary tokens



I solved this by hacking the code generator to call the createTypeToken()
function instead of the createTokenText() or createTypeText() functions
that the generated code normally calls. You might be able to avoid this
change by using ID[$kw,""] in your grammar.

Ad

On Wed, Sep 12, 2012 at 2:40 AM, Mike Lischke <mike at lischke-online.de>
wrote:


Jim,


> It is not a bug with the C target, as I have explained on numerous
> occasions. The other targets rely on method signatures to select the
> correct re-write. This is not available in C.

Sorry, have never seen such an explanation with all my searches I have done
already in this list. You know all the internals surely way better than I
do, but what is specifically missing that you can't create a virtual token
with info from another token? Making a construct like DUM[$lb] working
doesn't sound very complicated.


>
> However, the information is erroneous anyway. Look at the generated code
> and you will see that only root nodes are fixed up with positional info.
>
> Finally, rewriting like that is very expensive. I don't recommend it
> anyway.

You are probably referring to the complete original example while I'm
specifically after a simple way to change properties of a token (especially
when it can be written target independently). A good example is the list of
keywords, which must sometimes be interpreted as normal identifiers, so
what would be simple is something like:

keywords:
        (
        kw = KEYWORD1
        | kw = KEYWORD2
        ...
        )
        -> ID[$kw]
;

There's no separate info necessary I'd say, everything is there, but still,
the C target produces incorrect code (using kw like a string IIRC).

So what I do now (as I really need this) is:

keywords:
        KEYWORD1
        | KEYWORD2
        ...
;
finally
{
        retval.start->setType(retval.start, IDENTIFIER);
}

which is rather a hack IMO, but the simplest solution I could come up with.
I'm all ears for better solutions, if there's any.

Btw. when a feature really cannot be implemented in the C target, wouldn't
it be better to write out some error message that the compiler complains
about, so the grammar developer knows he cannot use this feature, instead
letting him believe all is fine? Otherwise he's condemned to debug the
grammar until he finds out the produced code is wrong (which can take quite
some time when working with big grammars where loading the parser into the
editor can easily take 20-30 secs).


Mike
--
www.soft-gems.net



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list