[antlr-interest] C runtime and aggregation in the parser

Wed Jul 14 08:40:11 PDT 2010

Don't use the pANTLR3_STRING unless you are not concerned with memory usage. It is better to reference the token, then use the pointers to the input stream that are contained within it directly. Then you can create your std::string or can use it for the direct pointers in C (which is what I do). The $text stuff the yields pANTLR3_STRING makes things nice and easy to use, but as it tracks memory and does not release any until you relese the parser, then it can use lots of memory if you are not careful!

Jim 

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Richard Thrippleton
> Sent: Wednesday, July 14, 2010 4:04 AM
> To: Nathan Eloe
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] C runtime and aggregation in the parser
> 
> Nathan Eloe wrote:
> > Hello again,
> > I'm writing about a very specific problem I'm having with the C
> runtime.
> > One of the restrictions of the grammar I'm writing is that strings
> may
> > contain some specific characters (such as # or %), but other rules
> have
> > these as operators, and as such I can't just make a token to catch
> all
> > strings.  The only way around this I've found has been aggregating
> > allowable strings in the parser.
> > Example:
> > ns_str_agg
> >   : nsp=ns_str_part nsap=ns_str_aggp -> STRING[$nsp.text+$nsap.text]
> >   | ns_str_part
> >   | rw=res_word_str nsap=ns_str_aggp -> STRING[$rw.text+$nsap.text];
> >
> > This worked just fine when I was using the java runtime (so I could
> use
> > the debugger and gunit to test my grammar).  When moving to the C
> > runtime, I get the following error (and lots of them):
> >
> > bashastParser.c: In function 'ns_str_agg':
> > bashastParser.c:42343: error: invalid operands to binary + (have
> > 'uint8_t *' and 'pANTLR3_STRING')
> >
> > I've attached the grammar to this email (I am attempting to recreate
> the
> > Bash grammar).  Is there some way around this or some way to
> correctly
> > do this kind of aggretation with the C runtime?
> Anything inside "[ ... ]" of a token constructor is native code (Java
> or C
> in your case), and all that is done to it by ANTLR is to expand the
> $-prefixed expressions.
> 
> In Java you were fine because the Java backend of ANTLR expands
> $something.text to be an expression of type String, and Java overloads
> the
> operator '+' to work as you'd expect.
> 
> In C, the $something.text expressions get expanded to be expressions
> that
> give you a pointer to an ANTLR3_STRING[1], and C has no idea what to do
> with
> those when applied to the '+' operator. Look at the functions in
> http://www.antlr.org/api/C/struct_a_n_t_l_r3___s_t_r_i_n_g__struct.html
> if
> you want to manipulate ANTLR3_STRINGs.
> 
> My own preferred approach is to be using a C++ compiler and have a
> function
> that turns an ANTLR3_STRING into a std::string so I can do things like
> 	STRING[antlrStr($nsp.text) + antlrStr($nsap.text)]
> 
> Richard
> 
> [1] - I'm not sure why one of them seems to be being expanded into a
> uint8_t* in one case. I'd strongly encourage looking at the generated
> C.
> --
> \o/
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address