[antlr-interest] lexer: embedded quotes assistance

Gavin Lambert antlr at mirality.co.nz
Thu Aug 23 13:02:01 PDT 2007


At 07:39 24/08/2007, Edwards, Waverly wrote:
 >Well after a lot more reading it is still not 
easy.  Below is what
 >I've done to capture the quoted text.  I'm hoping someone can
 >assist me.  My grammar is very short as at this 
point I just need
 >to get past capturing quotations with embedded 
quotes.  Below that
 >is the code I used to test what I was doing was correct ( or
 >seemingly correct ).  The code below the 
grammar works just fine.
 >I decided that instead of making an exception to allow the
 >continuation across lines I would throw an 
error if there was not
 >one, thereby saving myself more headaches.

Not tested, but this ought to do what you want:

DBLQUOTE
   :  '"'
      (~'"' | '""')*
      '"'
   ;

(This will also permit line breaks inside 
strings.  If you want to disallow that then just 
change the ~'"' into ~('"' | NEWLINE).)

This will match the entire string properly 
(detecting embedded double-quotes vs. the string 
terminator), but won't strip the surrounding 
quotes nor remove one of each quote pair.  I'm 
not a Javaite, so I can't give you exact code, 
but something along these lines ought to do that:

   String original = $getText();
   String text = original.substring(1, original.length() - 2);
   text = text.replace("\"\"", "\"");
   $setText(text);

To do your line continuation you just need to add 
an extra alt into the main rule:
   ... | '¬' NEWLINE ...
And then add additional 'replace' calls in the 
action code to replace "¬\r\n" or "¬\n" with "".



More information about the antlr-interest mailing list