[antlr-interest] The unary not (~) vs. W3C EBNF dash operator

Fri Oct 26 20:40:29 PDT 2007

>Andreas Ravnestad wrote:
>> On 10/8/07, Johannes Luber <jaluber at gmx.de> wrote:
>>> ANTLR doesn't support this dash notation. For now the only correct way
>>> is create the subset manually.
>>>
>>> Johannes Luber
>>
>> I see, how unfortunate :) Being new to ANTLR, I am not quite sure how
>> to "create the subsets manually". Could you point me to an example?
>>
>> -Andreas
>>
>I don't know any example. I would simply create a rule which matches the
> subset without the referencing the "parent"
>
>Johannes Luber

I wrote the following, related question to the forum
(http://www.jguru.com/forums/view.jsp?EID=1349571) but since
discovered that the real discussion is here. Like Andreas, I am new to
ANTLR and do not understand how to translate this spec into ANTLR
format...

In the XML 1.0 Spec there are the following rules: CDATA Sections
[18]  CDSect ::=  CDStart CData CDEnd
[19]  CDStart ::=  '<![CDATA['
[20]  CData ::=  (Char* - (Char* ']]>' Char*))
[21]  CDEnd ::=  ']]>'

[2]  Char ::=  #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

where
A - B matches any string that matches A but does not match B.
I have the ANTLR3 e-book, but I still am unclear how to convert the
spec for CData above (rule [20]) into ANTLR grammar format.

Is there a simple explanation that I have missed?

regards,
david