[antlr-interest] Syntactic anti-predicates

Darach Ennis darach at gmail.com
Sun Feb 10 14:16:10 PST 2008


Hi Steve, all.

Brief delurk! I've had a similar situation recently and found that the
following worked with a test
grammar as follows. My logic is slightly different: Between a ',' and a ').'
arbitrary token sequences
may be present. My difficulty is in matching the ')' as ')' can be present
in the arbitrary token sequences
*but* I'm guaranteed that I will always see a '(' before a ')' for a valid
'arbitrary token sequence'. So I
can use those facts to correctly identify the terminal ')' for the arbitrary
code block and/or fail accordingly:

grammar Test;

@parser::header {
package testing;

import java.io.*;
}

@lexer::header {
package testing;

}

@lexer::members {
 int pc = 0;
}

@parser::members {
 public static void main(String args[]) throws Throwable {
       final ANTLRInputStream cs = new ANTLRInputStream(new
FileInputStream("resources/test.txt"));
       final TestLexer el = new TestLexer(cs);
       final CommonTokenStream et = new CommonTokenStream(el);
       final TestParser ep = new TestParser(et);
       ep.test();
 }
}

test:   (B (t=T { System.out.println("T: " + t.getText()); })? E)+;

T   : { pc = 0; } ',' ( ~(')'|'(')| WS | '(' { pc++; } | { pc > 0}?=> ')' {
pc--; } )* ')' '.';
B:  '<b>';
E:  '<e>';
WS  :   (' ' | '\t' | '\n' | '\r' | '\f') { $channel=HIDDEN; };

Here is some test content:

<b><e>
<b> , {} arbitrary code [] ).<e>
<b> , () () () () () ()

() {} []
). <e>

Here is the test/debug output:

T: , {} arbitrary code [] ).
T: , () () () () () ()

() {} []
).

The gated predicate ( ... { pc > 0?=> ')'  ... ) and parenthesis 'reference
counting'
is what allows this to work but I've had limited/little success with more
complex
contexts such as trying to match more than a single character for
terminating the
arbitrary token sequence block.

The same grammar fails to produce the same output and/or exhibit the same
behavior when interpreted by ANTLRworks with the same input data. Having
the book definitely helps too.

In my case I'm trying to build a preprocessor so I've begun referring to the
cpp
grammar (http://www.antlr.org/grammar/1166665121622/Cpp.tar) by YoungKi
KU to see if I can grok some ideas or learn something from that.


Regards,

Darach.





On Feb 9, 2008 5:53 PM, Steve Bennett <stevagewp at gmail.com> wrote:
> Is there a convenient way to say "if the upcoming tokens look like X Y
> Z" then *don't* match this rule? It seems I always have to resort to a
> semantic predicate like this:
>
> {input.LA(1) != X && input.LA(2) != Y}? =>  R
>
> or some complicated rule that amount to not X Y Z:
>
> (notXYZ) => R
>
> Is there a simple way I'm missing?
>
> Thanks,
> Steve
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080210/3aaef570/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.jpg
Type: image/jpeg
Size: 9260 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20080210/3aaef570/attachment-0001.jpg 


More information about the antlr-interest mailing list