[antlr-interest] Allowing and maintaining space characters in string literals

William Clodius wclodius at los-alamos.net
Thu Sep 8 19:50:33 PDT 2011


Janet:

I wouldn't handle this as a lexical problem, I would handle it as a syntactical problem, eventually supplemented by semantic analysis. For example what do you want to do if you have multiple spaces between strings of characters? What I suggest is roughly

string_text: STRTEXT+;
STRTEXT: STRCHAR+
  
On Sep 8, 2011, at 12:08 PM, <Janet.Hurwitz at usc-bt.com> wrote:

> Hello- I'm working on a grammar that needs to support embedded blanks in strings: "identifier=two words"
> The interpreter keeps breaking at 'two' and doesn't know what to do with 'words'.
> I was initially ignoring white space (because 'id1 = oneword, id2 =" two words"' must also be supported with spaces around the = and ,), but obviously, can't do that.
> I have tried what was suggested in an archived post:
> 
> STRING_LITERAL : (STRCHAR)+ ( ((' ')+ STRCHAR)=> (' ')+ (STRCHAR)+ )*
> 
> But that didn't work either! (no viable alternative at input 'words'). It's not including 'words' as part of the string.
> 
> In my grammar:
> fragment LETTER :('a'..'z' | 'A'..'Z');
> fragment DIGIT : '0'..'9';
> fragment OTHERCHARS : ('.' | '/' | '-' | '&');
> STRCHAR : (LETTER | DIGIT | OTHERCHARS)+;

Note I find the + at the end of STRCHAR odd given the definition of STRING_LITERAL

> 
> I have tried various combinations of handling the blank in the lexing v. the parsing, including trying to create a quoted-string rule.
> I will have to support the following:
> 
> "identifier =two words"
> identifier ="two words"
> 
> The identifier=value pairs appear in a comma-separated line. There are various nested structures of identifier=value pairs involved, which is why both of the above formats are supported.
> 
> *** Bottom line*** I just want to indicate: If a space appears between quotation marks, include it as part of the current token; if not, throw it away.
> 
> I have everything working in a complex structure and tree walker except for the embedded blanks allowed in strings! Any suggestions are appreciated.
> 
> 
> 
> 
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list