[antlr-interest] Syntactic anti-predicates

Darach Ennis darach at gmail.com
Tue Feb 12 06:24:30 PST 2008


Hi Mark, all.

I've tried the semantic action approach and it works an absolute charm,
thank you!

I can see why it scales nicely into complex grammars.

The disambiguating predicate approach introduced considerable cross-cutting
concerns
into both the lexer and parser. So semantic actions are a lot like 'aspects'
in that they
consolidate crosscutting concerns into a single place.

It is worth trying both approaches once just to go through the learning
experience... so
I'm glad I pulled some, but not all, of my hair out!

Here's what I did with my 'real' grammar. I moved my
'ArbitraryTokenSequence' definition
higher up the declaration order. I defined conflicting and/or intersecting
tokens higher
again. In this way I was able to coerce types accordingly. Also there are a
hell of a lot
more arbitrary tokens the preprocessor tokens in my case so this order
should keep
the semantic actions nice and small. The semantic action code was very easy
to test drive
and to debug in the usual way:

    private final int[] ARBITRARY_TOKENS = {
            TestParser.ArbitraryCodeSequence,
            TestParser.Whitespace,
            TestParser.Variable,
            TestParser.AtomLiteral
    };

    public boolean isOneOf(final int[] types, int candidate) {
        for(int t : types) {
            if (candidate == t) return true;
        }
        return false;
    }

    public boolean isArbitraryTokenSequence(TokenStream input) {
        int i = 0;
        do {
            final Token token;
            try {
                token = input.get(i);
            } catch(IndexOutOfBoundsException e) {
                break; // empty or arbitrary tokens until eof (eg: no
directives)
            }
            if (token == null) { break; }
            final int type = token.getType();
            if (!isOneOf(ARBITRARY_TOKENS,type)) {
                // It could be a conflicting token, we can coerce
accordingly ...
                if (type == TestParser.LeftParen) {
                    token.setType(TestParser.Separator); // it is only a
left paren we care about if it is within a directive rule, ie: we dont care
in this case
                } else if (type == TestParser.FullStop) {
                    token.setType(TestParser.Separator); // ditto, see above
                } else {
                    break;
                }
            }

            i++;
        } while(true);
        return (i > 0); // at least one token matches
    }

This semantic action above is fairly raw and still a work in progress but in
only 30 minutes I've been a lot
more productive than with the dis-ambiguating predicate approach and my
grammar is simpler and easier
to maintain as a result.


Regards,

Darach.



On Feb 12, 2008 12:27 AM, Mark Wright <markwright at internode.on.net> wrote:

> Hello Darach and Steve,
>
> > Is there a convenient way to say "if the upcoming tokens look like X Y
> > Z" then *don't* match this rule?
>
> I find that path leads to madness in a complex grammar.
>
> An alternative approach that can be made to work is to
> place dis-ambiguating semantic predicates on every other alternative.
>
> > It seems I always have to resort to a
> > semantic predicate like this:
> >
> > {input.LA(1) != X && input.LA(2) != Y}? =>  R
> >
> > or some complicated rule that amount to not X Y Z:
> >
> > (notXYZ) => R
> >
> > Is there a simple way I'm missing?
>
> The left factoring approach you found later looks correct:
>
> > By the time the syntactic predicate inside fails the match, it's too
> > late.
>
> So you understand the problem.
>
> > I could left factor table_line:
>
> > table_line:
> >   (
> >   (ws? PIPE PLUS) => ws? table_caption
> >   | (ws? PIPE HYPHEN) => ws? table_row_separator
> >   | (ws? EXCLAMATION) => ws? table_header_cells
> >   | (ws? PIPE) => ws? table_data_cells
> >   );
> >
> > But that's inelegant.
>
> Looks elagant and like it might work to me.
>
> > The only way to get around it at the moment that I can see is to
> > either left factor as above, or use a semantic predicate - which I'm
> > not really sure how to use with an arbitrary length rule like ws.
>
> If you don't mind writing and debugging thousands of lines of your
> own code, then dis-ambiguating semantic predicates are very powerful
> for complex grammars.  Abitrary length rules and rules that call other
> rules can be handled with iterative loops and little hand coded
> recursive descent compilers.
>
> Darach writes:
> > The gated predicate ( ... { pc > 0?=> ')'  ... ) and parenthesis
> >'reference counting'
> > is what allows this to work but I've had limited/little success with
> > more complex
> > contexts such as trying to match more than a single character for
> > terminating the
> > arbitrary token sequence block.
>
> You can write a dis-ambiguating semantic predicate the scans forward
> matching parentheses.  Dis-ambiguating semantic predicates are called
> multiple times, so it easier to debug to log the answers and cache
> the result. I have attached an example to show how to log and cache
> the results.
>
> > The same grammar fails to produce the same output and/or exhibit the
> > same behavior when interpreted by ANTLRworks with the same input data.
>
> I think someone else noted recently on this list that the ANTLRworks
> interpreter does not understand the gated predicate, and you would have
> to use the debugger.
>
> > Having the book definitely helps too.
>
> The book is great.
>
> Regards, Mark
>
> --
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080212/9d62f496/attachment.html 


More information about the antlr-interest mailing list