[antlr-interest] Problem coding Antlr grammar for strings

Sun Jul 19 03:39:44 PDT 2009

2009/7/19 Gavin Lambert <antlr at mirality.co.nz>

> At 07:51 19/07/2009, LuÃs Reis wrote:
>
>> STRINGCONST
>>  : ('@"' ( options {greedy=false;} : . )* '"') //Accepts lots of stuff,
>> including newlines
>>  | ('"' (
>>    (
>>      '\\' ('\\' | '"' | 'n' | 't' | OCTALCHAR)
>>    ) | (
>>      ~('"'|'\\'|LINEBREAK)
>>    )
>>  )* '"')
>>  ;
>>
>> Which matches correctly "", "\\" and "\na" but fails for "abc"(with
>> MismatchedTokenException).
>> However, I can not understand *why* it fails for "abc"!
>>
>
> Best guess: it's LINEBREAK's fault.  Within a ~ block you can only use sets
> (alternatives of single characters).  Most likely, you've defined LINEBREAK
> as a sequence (can match two characters, if it sees '\r\n'; possibly even
> more if you've used a * or +).  This subtly breaks the ~ operation in
> strange ways.
>
> Try replacing LINEBREAK above with '\r'|'\n' and see if that helps.
>
> (Another possibility you should consider is to actually accept linebreaks
> in the non-@ strings at lexing time, but then raise an error at
> parse/tree-parse time that it's not valid to have a line-break in there.)
>
> Also: if you're trying to match C#-like strings then you'll need to modify
> the first alt a bit to support escaped quotes.
>

I am using

fragment LINEBREAK
    :    '\u000D'
    |    '\n'
        ;

and that problem still persists... The diagram on the right of ANTLRWorks'
interpreter shows
MismatchedTokenException(-1!=11)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090719/956971be/attachment.html