[antlr-interest] All except...
Gavin Lambert
antlr at mirality.co.nz
Tue Jun 5 04:52:00 PDT 2007
At 11:26 5/06/2007, Phil Oliver wrote:
>how does one create an ANTLR v3 rule (either lexer or parser)
>that easily matches any character EXCEPT a set of other
>characters? e.g. let's say I have:
>
>Char : '\u0009' | '\u000A' | '\u000D' | '\u0020'..'\uD7FF' |
>'\uE000'..'\uFFFD';
>
>and I want to define a rule that matches any Char except another
>list of characters. In the EBNF grammar used in the XQuery spec,
>for example, it would be:
>
>Char2: Char - ('<' | '>');
>
>which would cause Char2 to match any character in Char except
for
>'<' or '>'. But that operator isn't part of ANTLR (evidently).
I've
>looked at the ~ unary operator but that doesn't handle this job,
>unless I'm overlooking something.
ANTLR doesn't currently support set subtraction, though it does
support set addition (through the | operator) and negation (with
~). Since your Char definition is already a "just about
everything" set, you should first redeclare it as such:
Char: ~('\u0001'..'\u0008' | '\u000B' | '\u000C' |
'\u000E'..'\u001F' | '\uD800'..'\uDFFF' | '\uFFFE' | '\uFFFF');
(If you're supporting something higher than UTF-16 then the upper
bound might be a bit different.)
Then redeclare using a fragment token that also excludes the other
tokens you want to exclude from Char2:
fragment Char0: ~('\u0001'..'\u0008' | '\u000B' | '\u000C' |
'\u000E'..'\u001F' | '<' | '>' | '\uD800'..'\uDFFF' | '\uFFFE' |
'\uFFFF');
Char: Char0 | '<' | '>';
Char2: Char0;
More information about the antlr-interest
mailing list