[antlr-interest] The unary not (~) vs. W3C EBNF dash operator
Wincent Colaiuta
win at wincent.com
Mon Oct 8 14:11:03 PDT 2007
El 8/10/2007, a las 22:36, Loring Craymer escribió:
> The W3C spec uses a bastardized extension of BNF that
> supports pattern matching and comparison of matched
> patterns (the '-' operator that you are concerned
> with). To process that requires re-lexing of input;
> ANTLR does not support that, nor does any other sane
> lexer generator.
I may have misinterpreted you but I think you're wrong about that.
The operation that Andreas talks about is implemented in Ragel, for
example, as a "difference" operator. Ragel (<http://www.cs.queensu.ca/
~thurston/ragel/>) is a (very sane) finite state machine compiler
which implements the operator as follows:
> The difference operation produces a machine that matches strings
> that are in machine one but are not in machine two. To achieve
> subtraction, a union is performed on the two machines. After the
> result has been made deterministic, any final state that came from
> machine two or is a combination of states involving a final state
> from machine two has its final state status revoked. As with
> intersection, the operation is completed by pruning any path that
> does not lead to a final state. The following example demonstrates
> the use of subtraction to exclude specific cases from a set.
So this is a straightforward combination of two finite state machines
and no re-lexing is required in order to implement it. If you look at
the Ragel user manual (<http://www.cs.queensu.ca/~thurston/ragel/
ragel-guide-5.24.pdf>) you can see some nice state machine diagrams
that illustrate exactly what's being done.
Best wishes,
Wincent
More information about the antlr-interest
mailing list