[antlr-interest] C Runtime and Strings

Jim Idle jimi at temporal-wave.com
Sat Aug 25 17:04:26 PDT 2007


Yes, you can do that too, the token returned by I1=IDENT... does exactly
this, specifying only the start and stop index. $I1.xxxx where xxx is
the thing you want from the token as per examples and the published book
etc. For instance, with $I1 you can use getStartIndex and getStopIndex.

However, the string from .text is not actualized until it is asked for,
for performance reasons. Once you have a ANTLR3_STRING though, you can
use things like append8, addc, length, substring and so on. The string
is contracted from the input buffer in the way you want, though doing it
yourself can be more efficient.

Jim

> -----Original Message-----
> From: Stefan Klinger [mailto:dev.null.nix at gmail.com]
> Sent: Saturday, August 25, 2007 1:34 PM
> To: Jim Idle
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] C Runtime and Strings
> 
> Jim Idle wrote:
> > This is just:
> >
> > I1=IDENTIFIER
> > {
> >   //here $I1.text returns the string for it as ANTLR3_STRING
> >  //      $I1.text->chars is the actual pointer
> 
> 
> But this seems to return a null terminated string, so it can't be of
> the
> input buffer directly.
> 
> My idea was that i refer to the input buffer directly so that i can
> simply get the range of match directly of the input buffer.
> 
> Because if i allocate a buffer for the combined tokens i have to free
> it
> in the calling rule, and this would be error prone and slower than
> directly referring to the input buffer.
> 
> Is there any way to to this?
> 
> 
> To explain the idea a bit more verbose (not actually working code):
> 
> struct CombinedString {
> 	char *begin;
> 	char *end;
> };
> 
> Append(CombinedString *buffer, char *ptr, size_t length) {
> 	if(!buffer->begin) {
> 		buffer->begin = ptr;
> 	}
> 	buffer->end = ptr + length;
> }
> 
> scoped_identifier returns [CombinedString name]
> 	@init { name.begin = NULL; name.end = NULL; }
> 	: ((a=NAMESPACE_COLON {Append($name, $a, $a.length);} )?
> 	((b=IDENTIFIER c=NAMESPACE_COLON) {Append($name, $b, $b.length);
> Append($name, $c, $c.length);} )*
> 	d=IDENTIFIER {Append($name, $d, $d.length);} );
> 
> >
> >> -----Original Message-----
> >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> >> bounces at antlr.org] On Behalf Of dev.null.nix at gmail.com
> >> Sent: Friday, August 24, 2007 5:33 AM
> >> To: antlr-interest at antlr.org
> >> Subject: [antlr-interest] C Runtime and Strings
> >>
> >> Hi,
> >>
> >> I have a rule like this in a C target language grammar:
> >>
> >> scoped_identifier : NAMESPACE_COLON? (IDENTIFIER NAMESPACE_COLON)*
> >> IDENTIFIER;
> >>
> >> I would like to return the string representing scoped_identifier.
> >>
> >> Is there any way to access the buffer location of a token in the
> input
> >> buffer (like storing a pointer to the first and the last char).
> >>
> >> And if there is a way, is the input buffer read into one big buffer
> so
> >> that it is valid for the whole parser?
> >>
> >> Thanks,
> >> Stefan
> >



More information about the antlr-interest mailing list