[antlr-interest] [ANTLR3C] how to create a C++ std::string from tokens

Gitsis Christos cgitsis at gmail.com
Thu Feb 2 20:02:03 PST 2012


Hello,

I would like to use std::strings in my grammar. Actually I am already using
them, somehow like this:

website returns [std::string s]: STRING
  { /* TODO: remove reinterpret casts */
    $s = string(reinterpret_cast<const char*> ($text->chars));
  };

Surely there is a better way, and actually I have found (here:
http://antlr.markmail.org/message/4altudq2tagicz2z?q=std+string#query:std%20string+page:1+mid:7yok3nvtiqgekwec+state:results<https://webmail.uth.gr/horde/services/go.php?url=http%3A%2F%2Fantlr.markmail.org%2Fmessage%2F4altudq2tagicz2z%3Fq%3Dstd%2Bstring%23query%3Astd%2520string%2Bpage%3A1%2Bmid%3A7yok3nvtiqgekwec%2Bstate%3Aresults>
)
a message from Jim Idle implying that I must do it another way, but I would
like
someone to elaborate. His advice was:

"Get a token reference: mytok=TOKEN

Then create a factory method or a function or whatever that takes a token
pointer and creates your C++ string. Do not use the $text references as
these
will create ANTLR3_STRING that you don't need."

I tried this

website returns [std::string s]: strtoken=STRING
  {
    ANTLR3_MARKER c1 = $strtoken->start;
    ANTLR3_MARKER c2 = $strtoken->stop;
    /* ... ? */
  };

c1 and c2 seem to be long numbers, I cannot make much sense of them, nor of
antlr3c's source code. How to continue?

And could this method be generalized so as to receive the whole text
matched for
one rule, e.g.

dateTime : INT'/'INT'/'INT INT':'INT':'INT
  {
    string s = string(reinterpret_cast<const char*> ($text->chars));
    $timestamp::t = boost::posix_time::time_from_string(s);
  };


More information about the antlr-interest mailing list