[antlr-interest] [ANTLR3C] how to create a C++ std::string from tokens

Thu Feb 2 20:26:05 PST 2012

Using $strtoken will tell ANTLR to create pointer(pANTLR3_COMMON_TOKEN) to
the token specified. The fields start and stop are themselves pointers to
your input stream which you can cast to "const char *". Then you can use
something like

std::string ident((const char *)start,((const char *)stop+1)-(const char
*)start);

To get text for an entire rule there are two ways that I've used:
1. Get the start field from the first token and the stop field from the
last token. This will work inside the parser rule.
2. Assign to the rule to a variable in your grammar. When the rule return
ANTLR will create a struct containing the first and last tokens so you can
access them with something like:

some_rule :
  $var = sub_rule {ctx->some_function($var.start->start,$var.stop->stop);}
  ;

On Fri, Feb 3, 2012 at 10:02 AM, Gitsis Christos <cgitsis at gmail.com> wrote:

> Hello,
>
> I would like to use std::strings in my grammar. Actually I am already using
> them, somehow like this:
>
> website returns [std::string s]: STRING
>  { /* TODO: remove reinterpret casts */
>    $s = string(reinterpret_cast<const char*> ($text->chars));
>  };
>
> Surely there is a better way, and actually I have found (here:
>
> http://antlr.markmail.org/message/4altudq2tagicz2z?q=std+string#query:std%20string+page:1+mid:7yok3nvtiqgekwec+state:results
> <
> https://webmail.uth.gr/horde/services/go.php?url=http%3A%2F%2Fantlr.markmail.org%2Fmessage%2F4altudq2tagicz2z%3Fq%3Dstd%2Bstring%23query%3Astd%2520string%2Bpage%3A1%2Bmid%3A7yok3nvtiqgekwec%2Bstate%3Aresults
> >
> )
> a message from Jim Idle implying that I must do it another way, but I would
> like
> someone to elaborate. His advice was:
>
> "Get a token reference: mytok=TOKEN
>
> Then create a factory method or a function or whatever that takes a token
> pointer and creates your C++ string. Do not use the $text references as
> these
> will create ANTLR3_STRING that you don't need."
>
> I tried this
>
> website returns [std::string s]: strtoken=STRING
>  {
>    ANTLR3_MARKER c1 = $strtoken->start;
>    ANTLR3_MARKER c2 = $strtoken->stop;
>    /* ... ? */
>  };
>
> c1 and c2 seem to be long numbers, I cannot make much sense of them, nor of
> antlr3c's source code. How to continue?
>
> And could this method be generalized so as to receive the whole text
> matched for
> one rule, e.g.
>
> dateTime : INT'/'INT'/'INT INT':'INT':'INT
>  {
>    string s = string(reinterpret_cast<const char*> ($text->chars));
>    $timestamp::t = boost::posix_time::time_from_string(s);
>  };
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>