[antlr-interest] How to force a TreeWalker to match exactly?

Bryan Ewbank ewbank at gmail.com
Mon Mar 20 16:47:32 PST 2006


One thing that will help is to share token types using the import mechanism.
We have the Parser produce a token file, and the TreeParsers use that token
*.txt file as their token set.  To audit, use diff of the Parser's *.txt file
with each TreeParser's *.txt file.

Another thing is a light-weight tree audit pass (one rule, one lookup table);
We implemented tree audit passes with a rule and a lookup table:

   node : #(op:. { audit(op->getType(), op->numKids()); } ( node )* );

The audit() method looks up a node type in a table, and verifies that the
number of kids is correct.  It can also report if there's a node that is not in
the table of NodeType->#kids map.

We also use more precise audit passes that define the contract between passes
by detecting correctly formed trees:

   noMoreKids[AST root] : ( . { report failure here; } )? ;

   expr : #(op:EXPRESSION value noMoreKids[#op]);
   value : add | sub | mult | div | .... ;
   add : #(op: ADD value value noMoreKids[#op] );
   sub : #(op: SUB value value noMoreKids[#op] );
   mult : #(op: MULT value value noMoreKids[#op] );
   div : #(op: DIV value value noMoreKids[#op] );
   // etc

Hope this helps,
- Bryan E

On 3/20/06, Koehne Kai <Kai.Koehne at student.hpi.uni-potsdam.de> wrote:
> Hi,
>
> I have used AntLR for a while now, and thought I knew most of the traps. I was wrong :-(
>
> I am working on a compiler that started as a multi-person project. We decided
> to separate the different compiler stages in different grammars: A lexer
> generates tokens, a parser constructs an AST, which all the following stages
> consume. These stages are mostly written as TreeWalkers, which all inherit
> from an abstract TreeWalker grammar describing the output of the parser ... I
> think this is the standard way to do it with AntLR.
>
> However, I just detected that the compiler fails to reject a lot of incorrect
> programs! This happens because the parser sometimes generates AST nodes that
> the TreeWalker grammar just silently ignores ... First I thought that there
> is something wrong with AntLR or with my error-handling code. But then I
> reread the documentation, and there it is:
>
> "An important thing to remember when specifying tree patterns and tree
> grammars in general is that sufficient matches are done not exact matches. As
> long as the tree satistfies the pattern, a match is reported, regardless of
> how much is left unparsed."
>
> This can be very annoying! If there is an inconsistency between the Parser
> and the TreeWalker grammar, there is very little chance to detect it.
> Instead, unknown nodes are sometimes silently ignored. Is there a way to turn
> off this feature? How do you ensure that the TreeWalker grammars really
> correspond to the AST generated by the parser?
>
> Regards,
>
> Kai Koehne
>
>


More information about the antlr-interest mailing list