[antlr-interest] 'match not' not working

David-Sarah Hopwood david-sarah at jacaranda.org
Tue Aug 4 16:13:18 PDT 2009


Tom wrote:
> I have a pretty simply grammar to construct; it must find balanced 
> tokens in a stream. For example:
>     aaa %{ bbb }% ccc
> 
> These may be nested:
>     aaa %{ aaa %{ bbb }% ccc }% ccc
> 
> or not present at all:
>     aaa
> 
> or consequtive:
>     aaa %{ bbb }% ccc %{ bbb }% ccc
> 
> So all a need to do is find the tokens with non-tokens in between. This 
> is the grammar I expected that would do that:
> 
> /*------------------------------------------------------------------
>  * PARSER RULES
>  *------------------------------------------------------------------*/
>       
> parse: loop;
> 
> loop: noloop LOOPSTART loop LOOPEND loop
>     | noloop
>     ;
>    
> noloop: (~( LOOPSTART | LOOPEND ))*
>       ;
> 
> /*------------------------------------------------------------------
>  * LEXER RULES
>  *------------------------------------------------------------------*/
> 
> LOOPSTART: '%{';
> LOOPEND: '}%';

The problem is that you have no tokens other than '%{' and '}%'.
What you want is for '%{' and '}%' to be treated like keywords, and
to add another lexer rule that will match anything else:

  // untested

  tokens {
    LOOPSTART = '%{';
    LOOPEND   = '}%';
  }

  // parse and loop as above

  noloop: OTHER*
        ;

  OTHER: .;

Note that OTHER is not ambiguous with LOOPSTART or LOOPEND because the
latter are declared in the tokens block.

> NB: if this works I will introduce a third token; an %{ with an id in 
> between, written like: %id$

  OTHER: ~'%';
  ID: '%' ('a'..'z')+ '\$';  // for example

This assumes that '%' followed by anything other than '{', or a valid
identifier then '$', should be a syntax error.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com



More information about the antlr-interest mailing list