[antlr-interest] Allowing and maintaining space characters in string literals

Janet.Hurwitz at usc-bt.com Janet.Hurwitz at usc-bt.com
Thu Sep 8 12:13:14 PDT 2011


Thanks for the reply, John.
I have had a lot of success with the interpreter! But, perhaps I've hit a quirk.
I have tried lexing the quotations with the literal, but not successfully. I have another case where I'm also using quotation marks in parsing rules, and perhaps there is interference there.
The lexer rule you suggested broke my grammar elsewhere, but I will continue trying to work with it. I will also check the Java grammar example.

-----Original Message-----
From: John B. Brodie [mailto:jbb at acm.org] 
Sent: Thursday, September 08, 2011 2:58 PM
To: Hurwitz, Janet
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Allowing and maintaining space characters in string literals

Greetings!

Have you looked at the Java grammar in the v3 example suite?
also....

On Thu, 2011-09-08 at 18:08 +0000, Janet.Hurwitz at usc-bt.com wrote:
> Hello- I'm working on a grammar that needs to support embedded blanks in strings: "identifier=two words"
> The interpreter keeps breaking at 'two' and doesn't know what to do with 'words'.

don't use the interpreter. it has some quirks.

> I was initially ignoring white space (because 'id1 = oneword, id2 =" two words"' must also be supported with spaces around the = and ,), but obviously, can't do that.
> I have tried what was suggested in an archived post:
> 
> STRING_LITERAL : (STRCHAR)+ ( ((' ')+ STRCHAR)=> (' ')+ (STRCHAR)+ )*

are you lexing the leading/trailing quote marks separately from the characters comprising the string literal?

if so don't do that.

> But that didn't work either! (no viable alternative at input 'words'). It's not including 'words' as part of the string.
> 
> In my grammar:
> fragment LETTER :('a'..'z' | 'A'..'Z'); fragment DIGIT : '0'..'9'; 
> fragment OTHERCHARS : ('.' | '/' | '-' | '&'); STRCHAR : (LETTER | 
> DIGIT | OTHERCHARS)+;
> 
> I have tried various combinations of handling the blank in the lexing v. the parsing, including trying to create a quoted-string rule.
> I will have to support the following:

you want the string literal to be processed completely by the lexer, from the opening quote up to and including the closing quote. that way no other tokens will interfere with handling the characters between the quote marks.

> 
> "identifier =two words"
> identifier ="two words"
> 
> The identifier=value pairs appear in a comma-separated line. There are various nested structures of identifier=value pairs involved, which is why both of the above formats are supported.
> 
> *** Bottom line*** I just want to indicate: If a space appears between quotation marks, include it as part of the current token; if not, throw it away.
> 
> I have everything working in a complex structure and tree walker except for the embedded blanks allowed in strings! Any suggestions are appreciated.

these lexer rules work for me:

STRING : '"' (options{greedy=false;}:( ~('\\'|'"') | ('\\' '"')))* '"'; 

WS : ( ' ' | '\t' | '\f' | '\r' | '\n' )+ { $channel=HIDDEN; } ;

Hope this helps...
   -jbb




More information about the antlr-interest mailing list