[antlr-interest] Easier way to do string literals?

Mon Oct 15 12:08:55 PDT 2007

At 21:37 15/10/2007, Vaclav Barta wrote:
 >Is there really no way to parse C-like string literals in one
 >pass?

Not no way, it's just tricky :)

To do it in one pass you really have to do everything in the 
lexer.  This requires the final lexer rule to use setText, as I 
previously mentioned; but it's complicated by the fact that only 
the top-level lexer rule can successfully call setText (subrules 
can try but it won't accomplish anything useful), and that lexer 
rules don't support return values.

I haven't tried it, but I think fragment rules do still support 
parameters, so something like this might work:

STRING
@init {
   StringBuffer sb = new StringBuffer();
}
   : '"' (EscapeSequence[$sb] | StringChar[$sb])* '"' { 
setText($sb.toString()); }
   ;

fragment EscapeSequence[StringBuffer sb]
   : '\\'
     ( '\\' { $sb.append('\\'); }
     | 't' { $sb.append('\t'); }
     ...
     )
   ;

fragment StringChar[StringBuffer sb]
   : x=(~('\\' | '"')) { $sb.append($x.text); }
   ;

I seem to vaguely recall ANTLR not liking something similar to the 
syntax I've used in StringChar, so you might need to move the 
action code for that up into the STRING rule.
I also don't recall if you need to escape the backslashes in the 
append calls in EscapeSequence or not.  So you'll probably need to 
experiment a bit.