[antlr-interest] Easier way to do string literals?
Vaclav Barta
vbar at comp.cz
Mon Oct 15 01:37:28 PDT 2007
Gavin Lambert wrote:
> At 20:18 15/10/2007, Vaclav Barta wrote:
> >quotedString returns [ String value ]
> >@init {
> > StringBuffer sb;
> >} : {
> > sb = new StringBuffer();
> >}
> > DQUOTE (
> > EscapeSequence { sb.append($EscapeSequence.getText()); }
> > | BareString { sb.append($BareString.getText()); }
> > )* DQUOTE { $value = sb.toString(); }
> > ;
>
> That sort of thing is fine if all you're parsing is string constants,
> but in a larger language it loses (apart from anything else, you've
> probably got an auto-whitespace-stripper, whereas whitespace needs to be
Sorry, I've simplified too much - the original has
quotedString returns [ String value ]
@init {
StringBuffer sb;
} : {
sb = new StringBuffer();
}
DQUOTE (
EscapeSequence { sb.append($EscapeSequence.getText()); }
| BareString { sb.append($BareString.getText()); }
| COLON { sb.append(':'); }
| EQ { sb.append('='); }
| SP { sb.append($SP.getText()); }
| TAB { sb.append('\t'); }
| StringChar { sb.append($StringChar.getText()); }
| v = varUse { sb.append($v.value); }
)* DQUOTE { $value = sb.toString(); }
;
and the whole grammar (I've put it at
http://mangrove.cz/antmaker/Loader.g - it's just an experiment with
Makefile-like syntax, converting build instructions to Ant XML) is
indeed a bit untypical in that it handles whitespace explicitly...
> preserved within strings). And you're quite likely going to get random
> Identifier and Number etc tokens in there, not just EscapeSequences and
> BareStrings. And unmatched comments, too -- block and line comment
...doesn't distinguish quoted from unquoted strings, identifiers and
numbers are just strings and if it had comments, they would be line
comments and their marker would have to have a branch inside
quotedString - so the example probably isn't as widely applicable as
I've implied, :-) but I'd still like to parse string literals (that are
sufficiently complicated to be parsed) by ANTLR...
> Now what you *could* do is to treat it like the island grammar example
> and have a separate ANTLR grammar for parsing the internals of strings,
> but that seems excessive to me for what amounts to a simple string
> replace operation.
Is there really no way to parse C-like string literals in one pass?
Bye
Vasek
More information about the antlr-interest
mailing list