[antlr-interest] The unary not (~) vs. W3C EBNF dash operator

Wincent Colaiuta win at wincent.com
Mon Oct 8 14:11:03 PDT 2007


El 8/10/2007, a las 22:36, Loring Craymer escribió:

> The W3C spec uses a bastardized extension of BNF that
> supports pattern matching and comparison of matched
> patterns (the '-' operator that you are concerned
> with).  To process that requires re-lexing of input;
> ANTLR does not support that, nor does any other sane
> lexer generator.

I may have misinterpreted you but I think you're wrong about that.  
The operation that Andreas talks about is implemented in Ragel, for  
example, as a "difference" operator. Ragel (<http://www.cs.queensu.ca/ 
~thurston/ragel/>) is a (very sane) finite state machine compiler  
which implements the operator as follows:

> The difference operation produces a machine that matches strings  
> that are in machine one but are not in machine two. To achieve  
> subtraction, a union is performed on the two machines. After the  
> result has been made deterministic, any final state that came from  
> machine two or is a combination of states involving a final state  
> from machine two has its final state status revoked. As with  
> intersection, the operation is completed by pruning any path that  
> does not lead to a final state. The following example demonstrates  
> the use of subtraction to exclude specific cases from a set.

So this is a straightforward combination of two finite state machines  
and no re-lexing is required in order to implement it. If you look at  
the Ragel user manual (<http://www.cs.queensu.ca/~thurston/ragel/ 
ragel-guide-5.24.pdf>) you can see some nice state machine diagrams  
that illustrate exactly what's being done.

Best wishes,
Wincent



More information about the antlr-interest mailing list