[antlr-interest] Problem coding Antlr grammar for strings
Luís Reis
luiscubal at gmail.com
Sun Jul 19 03:39:44 PDT 2009
2009/7/19 Gavin Lambert <antlr at mirality.co.nz>
> At 07:51 19/07/2009, LuÃs Reis wrote:
>
>> STRINGCONST
>> : ('@"' ( options {greedy=false;} : . )* '"') //Accepts lots of stuff,
>> including newlines
>> | ('"' (
>> (
>> '\\' ('\\' | '"' | 'n' | 't' | OCTALCHAR)
>> ) | (
>> ~('"'|'\\'|LINEBREAK)
>> )
>> )* '"')
>> ;
>>
>> Which matches correctly "", "\\" and "\na" but fails for "abc"(with
>> MismatchedTokenException).
>> However, I can not understand *why* it fails for "abc"!
>>
>
> Best guess: it's LINEBREAK's fault. Within a ~ block you can only use sets
> (alternatives of single characters). Most likely, you've defined LINEBREAK
> as a sequence (can match two characters, if it sees '\r\n'; possibly even
> more if you've used a * or +). This subtly breaks the ~ operation in
> strange ways.
>
> Try replacing LINEBREAK above with '\r'|'\n' and see if that helps.
>
> (Another possibility you should consider is to actually accept linebreaks
> in the non-@ strings at lexing time, but then raise an error at
> parse/tree-parse time that it's not valid to have a line-break in there.)
>
> Also: if you're trying to match C#-like strings then you'll need to modify
> the first alt a bit to support escaped quotes.
>
I am using
fragment LINEBREAK
: '\u000D'
| '\n'
;
and that problem still persists... The diagram on the right of ANTLRWorks'
interpreter shows
MismatchedTokenException(-1!=11)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090719/956971be/attachment.html
More information about the antlr-interest
mailing list