[antlr-interest] Easier way to do string literals?

Mon Oct 15 00:29:47 PDT 2007

At 20:18 15/10/2007, Vaclav Barta wrote:
 >quotedString returns [ String value ]
 >@init {
 >	StringBuffer sb;
 >} : {
 >	sb = new StringBuffer();
 >}
 >	DQUOTE (
 >		EscapeSequence { sb.append($EscapeSequence.getText()); }
 >		| BareString { sb.append($BareString.getText()); }
 >	)* DQUOTE { $value = sb.toString(); }
 >	;

That sort of thing is fine if all you're parsing is string 
constants, but in a larger language it loses (apart from anything 
else, you've probably got an auto-whitespace-stripper, whereas 
whitespace needs to be preserved within strings).  And you're 
quite likely going to get random Identifier and Number etc tokens 
in there, not just EscapeSequences and BareStrings.  And unmatched 
comments, too -- block and line comment markers within the scope 
of a string have to be treated as part of the string, not as a 
comment.  So that's something else you'd have to hoist to parser 
level if you did things this way.  It's just messy.

Now what you *could* do is to treat it like the island grammar 
example and have a separate ANTLR grammar for parsing the 
internals of strings, but that seems excessive to me for what 
amounts to a simple string replace operation.