[antlr-interest] Non-disjoint tokens

Steve Bennett stevagewp at gmail.com
Mon Dec 3 03:06:54 PST 2007


On 12/3/07, Gavin Lambert <antlr at mirality.co.nz> wrote:
> Because you can only invert sets, not sequences.  In other words,
> this works:
>
> fragment DIGIT: '0'..'9';
> DIGITS: DIGIT+;
> NONDIGITS: (~DIGIT)+;
<snip>
> Note that ('a' | 'b') is still a set, so can be inverted; 'ab' is
> a sequence, and can't be.  (And all of these examples assume
> you're in the lexer -- the rule is the same in the parser but it
> presents itself differently, since each item in the set can be a
> complete token instead of just a single character.  Though
> inverting in the parser isn't common anyway.)

Thanks for that explanation. I think I was confusing the lexer and
parser slightly. What I wanted was "anything other than a sequence of
digits" rather than "a sequence of anything other than digits".
Decomposing it as you suggest works.

Steve


More information about the antlr-interest mailing list