[antlr-interest] 'match not' not working
David-Sarah Hopwood
david-sarah at jacaranda.org
Tue Aug 4 16:13:18 PDT 2009
Tom wrote:
> I have a pretty simply grammar to construct; it must find balanced
> tokens in a stream. For example:
> aaa %{ bbb }% ccc
>
> These may be nested:
> aaa %{ aaa %{ bbb }% ccc }% ccc
>
> or not present at all:
> aaa
>
> or consequtive:
> aaa %{ bbb }% ccc %{ bbb }% ccc
>
> So all a need to do is find the tokens with non-tokens in between. This
> is the grammar I expected that would do that:
>
> /*------------------------------------------------------------------
> * PARSER RULES
> *------------------------------------------------------------------*/
>
> parse: loop;
>
> loop: noloop LOOPSTART loop LOOPEND loop
> | noloop
> ;
>
> noloop: (~( LOOPSTART | LOOPEND ))*
> ;
>
> /*------------------------------------------------------------------
> * LEXER RULES
> *------------------------------------------------------------------*/
>
> LOOPSTART: '%{';
> LOOPEND: '}%';
The problem is that you have no tokens other than '%{' and '}%'.
What you want is for '%{' and '}%' to be treated like keywords, and
to add another lexer rule that will match anything else:
// untested
tokens {
LOOPSTART = '%{';
LOOPEND = '}%';
}
// parse and loop as above
noloop: OTHER*
;
OTHER: .;
Note that OTHER is not ambiguous with LOOPSTART or LOOPEND because the
latter are declared in the tokens block.
> NB: if this works I will introduce a third token; an %{ with an id in
> between, written like: %id$
OTHER: ~'%';
ID: '%' ('a'..'z')+ '\$'; // for example
This assumes that '%' followed by anything other than '{', or a valid
identifier then '$', should be a syntax error.
--
David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com
More information about the antlr-interest
mailing list