[antlr-interest] Easier way to do string literals?
Gavin Lambert
antlr at mirality.co.nz
Mon Oct 15 12:08:55 PDT 2007
At 21:37 15/10/2007, Vaclav Barta wrote:
>Is there really no way to parse C-like string literals in one
>pass?
Not no way, it's just tricky :)
To do it in one pass you really have to do everything in the
lexer. This requires the final lexer rule to use setText, as I
previously mentioned; but it's complicated by the fact that only
the top-level lexer rule can successfully call setText (subrules
can try but it won't accomplish anything useful), and that lexer
rules don't support return values.
I haven't tried it, but I think fragment rules do still support
parameters, so something like this might work:
STRING
@init {
StringBuffer sb = new StringBuffer();
}
: '"' (EscapeSequence[$sb] | StringChar[$sb])* '"' {
setText($sb.toString()); }
;
fragment EscapeSequence[StringBuffer sb]
: '\\'
( '\\' { $sb.append('\\'); }
| 't' { $sb.append('\t'); }
...
)
;
fragment StringChar[StringBuffer sb]
: x=(~('\\' | '"')) { $sb.append($x.text); }
;
I seem to vaguely recall ANTLR not liking something similar to the
syntax I've used in StringChar, so you might need to move the
action code for that up into the STRING rule.
I also don't recall if you need to escape the backslashes in the
append calls in EscapeSequence or not. So you'll probably need to
experiment a bit.
More information about the antlr-interest
mailing list