[antlr-interest] Easier way to do string literals?

Sun Oct 14 23:57:29 PDT 2007

At 18:13 15/10/2007, Rick Mann wrote:
 >StringLiteral returns [String s]
 >     :  '"' StringGuts '"' { $s = $StringGuts.text; }
 >     ;
[...]
 >But it's not really working quite like I'd expect. The resulting 

 >text includes the quotes, and the escapes don't seem to really
 >turn into the actual characters (I realize I need something more 

 >there).

Lexer rules don't support return values (since they already have a 
return value: the token), so your "returns" block won't have any 
effect there.  That's why you're still getting the quotes.  (There 
should be a warning/error message about this, but apparently 
that's not possible until ANTLR3 becomes self-hosted.)

There's an example in the wiki showing how to get rid of the 
quotes by using setText, which is probably what you want 
instead.  (FYI: setText creates a copy of the token text, whereas 
emit will use the same text as the main token stream.  Which means 
emit is faster but a little more finicky -- and not really 
suitable in your case, since you also want to munge the internal 
text by parsing the escapes.)

 >I need to also run through the text and handle the escapes. 
This
 >seems like the wrong approach, since it means I'm writing parse
 >code in Java, which strikes me as underutilizing ANTLR.

Well, you're always going to have to write your own escape-parsing 
code, since ANTLR can't make any guesses about what you want \n to 
mean.  Maybe it's a newline; maybe it's a placeholder for "the 
contents of variable 'n'", maybe it's something even more 
esoteric.

StringLiteral
   : '"' StringGuts '"' { setText(ParseEscapes($StringGuts.text)); 
}
   ;

And yes, you have to use setText at this level.  setText has no 
effect in a fragment rule, so you can't handle it inside 
EscapeSequence itself.  Which would be nice, but it's just not 
possible without a lot of dancing around.