[antlr-interest] ANTLR C empty strings and NULL ->chars

Wed Dec 16 21:20:36 PST 2009

Hi,

I'm hitting a problem relating to empty strings in ANTLR C
3.2, wondering if it's a bug in the C runtime or in how I'm
using it. 

I have a rule that uses SETTEXT() to set the token's text to
an empty string (see below).  Later on trying to use
$STRING.text->chars in an imaginary node rewrite rule
segfaults because ->chars on an empty string is NULL rather
than "\0". In toStringSS() it uses newRaw() to create the
string but no characters get appended.

Should newRaw8() be setting ->chars to a one-byte
null-valued array rather than NULL? I think the only place
where it will matter is toStringSS(), the rest of the calls
to newRaw() look like they'll call string->appendS() which
will set the chars pointer to a null-terminated string.

Cheers,
Matt

STRING                                                                                                        
@declarations                                                                                                 
{                                                                                                             
        pANTLR3_STRING tmp;                                                                                   
}                                                                                                             
        : '"' ~('"')* '"'                                                                                     
        {                                                                                                     
         // remove the string quotes from the token                                                           
         tmp = GETTEXT();                                                                                     
         SETTEXT(tmp->subString(tmp, 1, tmp->len-1));                                                         
        }                                                                                                     
        ;                                                                                                     

Parsing "" sets the token's text to empty. 
(It's an ugly rule but it's a workaround for something else.)

There's then something like:

plainvalue
   : STRING
   -> ^(PLAINVALUE[$STRING.text->chars])

It's then failing in the tree grammar when it tries to use
$PLAINVALUE.text.