[antlr-interest] [C] code to change Token type, use char* and loose data when buffer destroyed

Jim Idle jimi at temporal-wave.com
Tue Sep 27 11:45:59 PDT 2011


Each token contains the char * pointer that is in to the input stream
start, which is what I generally use, but if you want to use my build in
string stuff and have it auto free then it is just:

csl
@declarations { pANTLR3_STRING s; }
: s1=STRING
     { s= $s1.text; }
   (
	s2=STRING
	{
		s->append(s, $s2.text);
	}

   )*
	{ $s1->setText(s);  /* Check that, but I think it is this */ }

	->s1
;

You are complicating things though. There is no need to do that in the
parser, just use:

csl
: s1+=STRING -> $s1+  /* Or, ->^(SLIT $s1+) */
;

Then just do the string manipulation in the tree walk (which means you
will only use it if you have to). You still need to reference the text of
course.

So, really, don't use the $x.text as it is slow, just call an external C++
method/object that takes a pointer to the token or base tree object and
extracts the string (start is a void * address of first char, end is a
void * address of the last char, length is the difference). You can make a
neat C++ class that can accept either of these in the constructor and has
an overloaded append() and a getCstr(). Don't try to do too much in the
code itself and there is no need to amalgamate text and things in the
parser.


Jim


> -----Original Message-----
> From: Ruslan Zasukhin [mailto:ruslan_zasukhin at valentina-db.com]
> Sent: Tuesday, September 27, 2011 2:17 AM
> To: antlr-interest at antlr.org; Jim Idle
> Subject: [C] code to change Token type, use char* and loose data when
> buffer destroyed
>
> Hi All,
>
> ===== TASK ======
>
> In SQL we must be able write
>       SELECT 'aaa' 'bbbb'
>
> And this should be same as
>       SELECT 'aaabbbb'
>
> I.e. Parser must concatenate literals self.
> This was quite easy do in ANTLR 2,
> and I already have kill 5-6 hours in ANTLR 3.  :-((((((
>
>
> I have try many tricks for ANTLR3 itself trying to use its tokens and
> ANTLR_STRING class but no luck.
>
> Finally I have give up and have try to use simple code as in v2 using
> STD::string as place to accumulate literal.
>
> =================================
> character_string_literal
> @init{
>     STD::string st;
> }
>     :    ( STRING_LITERAL
>             {
>                 st.append(
>                     (const char*) $STRING_LITERAL.text->chars,
>                     $STRING_LITERAL.text->len );
>             }
>         )+
>             -> ^( CONST_STR[ st.c_str() ] )
>     ;
> =================================
>
> But this not works, because new Token object stores just pointer
>
>         newToken->textState        = ANTLR3_TEXT_CHARP;
>         newToken->tokText.chars = (pANTLR3_UCHAR)text;
>
> And as only STD::string dies we get problem.
>
>
> Jim, how this simple task can be solved in the C TARGET ?
>
> Also I see that for Java code they can contruct dynamic text And
> produce token using that text. For example on this page
>
> http://www.antlr.org/wiki/display/ANTLR3/Tree+construction
>
>                             -> ^('+' $p
> INT[String.valueOf($a.int+$b.int)])
>
>
> But C target tryies to work only which char*
>
>
> I guess that ANTLR_STRING setText() can help me, But I cannot see how I
> can call that from my
>
>             -> ^( CONST_STR[ st.c_str() ] )
>
> ???
>
> Thank you for points ...
>
>
> --
> Best regards,
>
> Ruslan Zasukhin
> VP Engineering and New Technology
> Paradigma Software, Inc
>
> Valentina - Joining Worlds of Information http://www.paradigmasoft.com
>
> [I feel the need: the need for speed]
>


More information about the antlr-interest mailing list