[antlr-interest] C runtime and aggregation in the parser

Nathan Eloe powerofazure at gmail.com
Wed Jul 14 07:28:12 PDT 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 7/14/10 6:03 AM, Richard Thrippleton wrote:
> Nathan Eloe wrote:
>> Hello again,
>> I'm writing about a very specific problem I'm having with the C runtime.
>> One of the restrictions of the grammar I'm writing is that strings may
>> contain some specific characters (such as # or %), but other rules have
>> these as operators, and as such I can't just make a token to catch all
>> strings.  The only way around this I've found has been aggregating
>> allowable strings in the parser.
>> Example:
>> ns_str_agg
>>   : nsp=ns_str_part nsap=ns_str_aggp -> STRING[$nsp.text+$nsap.text]
>>   | ns_str_part
>>   | rw=res_word_str nsap=ns_str_aggp -> STRING[$rw.text+$nsap.text];
>>
>> This worked just fine when I was using the java runtime (so I could use
>> the debugger and gunit to test my grammar).  When moving to the C
>> runtime, I get the following error (and lots of them):
>>
>> bashastParser.c: In function 'ns_str_agg':
>> bashastParser.c:42343: error: invalid operands to binary + (have
>> 'uint8_t *' and 'pANTLR3_STRING')
>>
>> I've attached the grammar to this email (I am attempting to recreate the
>> Bash grammar).  Is there some way around this or some way to correctly
>> do this kind of aggretation with the C runtime?
> Anything inside "[ ... ]" of a token constructor is native code (Java or
> C in your case), and all that is done to it by ANTLR is to expand the
> $-prefixed expressions.
> 
> In Java you were fine because the Java backend of ANTLR expands
> $something.text to be an expression of type String, and Java overloads
> the operator '+' to work as you'd expect.
> 
> In C, the $something.text expressions get expanded to be expressions
> that give you a pointer to an ANTLR3_STRING[1], and C has no idea what
> to do with those when applied to the '+' operator. Look at the functions
> in
> http://www.antlr.org/api/C/struct_a_n_t_l_r3___s_t_r_i_n_g__struct.html
> if you want to manipulate ANTLR3_STRINGs.
> 
> My own preferred approach is to be using a C++ compiler and have a
> function that turns an ANTLR3_STRING into a std::string so I can do
> things like
>     STRING[antlrStr($nsp.text) + antlrStr($nsap.text)]
> 
> Richard
> 
> [1] - I'm not sure why one of them seems to be being expanded into a
> uint8_t* in one case. I'd strongly encourage looking at the generated C.


Thank you so much for the reply.  I am using a C++ compiler and so was
actually just thinking of doing just that or overloading the + operator
(I've got a few options in mind).

As for the uint8_t* (which was what was really confusing me, as I knew
about + not knowing what to do with pANTLR3STRING), I didn't see
anything in the code that would cause that, but I'll give it another
look after I debug another problem.

Again, thanks, you've been very helpful.
Nathan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkw9yXwACgkQFpoRlVgtqKa39wCdGSnZWm9lYg1kEXR7l5HhB2ZG
PukAoIvPgGi8pSdu7Z15q+PWyHkjTJI7
=nXiO
-----END PGP SIGNATURE-----


More information about the antlr-interest mailing list