[antlr-interest] Syntactic anti-predicates

Mon Feb 11 16:27:12 PST 2008

Hello Darach and Steve,

> Is there a convenient way to say "if the upcoming tokens look like X Y
> Z" then *don't* match this rule?

I find that path leads to madness in a complex grammar.

An alternative approach that can be made to work is to
place dis-ambiguating semantic predicates on every other alternative.

> It seems I always have to resort to a
> semantic predicate like this:
>
> {input.LA(1) != X && input.LA(2) != Y}? =>  R
>
> or some complicated rule that amount to not X Y Z:
>
> (notXYZ) => R
>
> Is there a simple way I'm missing?

The left factoring approach you found later looks correct:

> By the time the syntactic predicate inside fails the match, it's too
> late.

So you understand the problem.

> I could left factor table_line:

> table_line:
>   (
>   (ws? PIPE PLUS) => ws? table_caption
>   | (ws? PIPE HYPHEN) => ws? table_row_separator
>   | (ws? EXCLAMATION) => ws? table_header_cells
>   | (ws? PIPE) => ws? table_data_cells
>   );
>
> But that's inelegant.

Looks elagant and like it might work to me.

> The only way to get around it at the moment that I can see is to
> either left factor as above, or use a semantic predicate - which I'm
> not really sure how to use with an arbitrary length rule like ws.

If you don't mind writing and debugging thousands of lines of your
own code, then dis-ambiguating semantic predicates are very powerful
for complex grammars.  Abitrary length rules and rules that call other
rules can be handled with iterative loops and little hand coded
recursive descent compilers.

Darach writes:
> The gated predicate ( ... { pc > 0?=> ')'  ... ) and parenthesis
>'reference counting'
> is what allows this to work but I've had limited/little success with
> more complex
> contexts such as trying to match more than a single character for
> terminating the
> arbitrary token sequence block.

You can write a dis-ambiguating semantic predicate the scans forward
matching parentheses.  Dis-ambiguating semantic predicates are called
multiple times, so it easier to debug to log the answers and cache
the result. I have attached an example to show how to log and cache
the results.

> The same grammar fails to produce the same output and/or exhibit the
> same behavior when interpreted by ANTLRworks with the same input data.

I think someone else noted recently on this list that the ANTLRworks
interpreter does not understand the gated predicate, and you would have
to use the debugger.

> Having the book definitely helps too.

The book is great.

Regards, Mark

-- 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Makefile
Type: application/octet-stream
Size: 1232 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20080212/fa747b62/attachment-0002.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SemanticActions.java
Type: text/x-java
Size: 4853 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20080212/fa747b62/attachment-0003.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: u.c
Type: text/x-csrc
Size: 14 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20080212/fa747b62/attachment-0004.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Unsigned.g
Type: application/octet-stream
Size: 692 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20080212/fa747b62/attachment-0003.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Unsigned.java
Type: text/x-java
Size: 3158 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20080212/fa747b62/attachment-0005.bin