[antlr-interest] Problem coding Antlr grammar for strings

Sat Jul 18 16:04:02 PDT 2009

At 07:51 19/07/2009, LuÃs Reis wrote:
>STRINGCONST
>   : ('@"' ( options {greedy=false;} : . )* '"') 
> //Accepts lots of stuff, including newlines
>   | ('"' (
>     (
>       '\\' ('\\' | '"' | 'n' | 't' | OCTALCHAR)
>     ) | (
>       ~('"'|'\\'|LINEBREAK)
>     )
>   )* '"')
>   ;
>
>Which matches correctly "", "\\" and "\na" but 
>fails for "abc"(with MismatchedTokenException).
>However, I can not understand *why* it fails for "abc"!

Best guess: it's LINEBREAK's fault.  Within a ~ 
block you can only use sets (alternatives of 
single characters).  Most likely, you've defined 
LINEBREAK as a sequence (can match two 
characters, if it sees '\r\n'; possibly even more 
if you've used a * or +).  This subtly breaks the 
~ operation in strange ways.

Try replacing LINEBREAK above with '\r'|'\n' and 
see if that helps.

(Another possibility you should consider is to 
actually accept linebreaks in the non-@ strings 
at lexing time, but then raise an error at 
parse/tree-parse time that it's not valid to have 
a line-break in there.)

Also: if you're trying to match C#-like strings 
then you'll need to modify the first alt a bit to 
support escaped quotes.