[antlr-interest] Problem coding Antlr grammar for strings
Gavin Lambert
antlr at mirality.co.nz
Sat Jul 18 16:04:02 PDT 2009
At 07:51 19/07/2009, LuÃs Reis wrote:
>STRINGCONST
> : ('@"' ( options {greedy=false;} : . )* '"')
> //Accepts lots of stuff, including newlines
> | ('"' (
> (
> '\\' ('\\' | '"' | 'n' | 't' | OCTALCHAR)
> ) | (
> ~('"'|'\\'|LINEBREAK)
> )
> )* '"')
> ;
>
>Which matches correctly "", "\\" and "\na" but
>fails for "abc"(with MismatchedTokenException).
>However, I can not understand *why* it fails for "abc"!
Best guess: it's LINEBREAK's fault. Within a ~
block you can only use sets (alternatives of
single characters). Most likely, you've defined
LINEBREAK as a sequence (can match two
characters, if it sees '\r\n'; possibly even more
if you've used a * or +). This subtly breaks the
~ operation in strange ways.
Try replacing LINEBREAK above with '\r'|'\n' and
see if that helps.
(Another possibility you should consider is to
actually accept linebreaks in the non-@ strings
at lexing time, but then raise an error at
parse/tree-parse time that it's not valid to have
a line-break in there.)
Also: if you're trying to match C#-like strings
then you'll need to modify the first alt a bit to
support escaped quotes.
More information about the antlr-interest
mailing list