[antlr-interest] How to force a TreeWalker to match exactly?
Bryan Ewbank
ewbank at gmail.com
Mon Mar 20 16:47:32 PST 2006
One thing that will help is to share token types using the import mechanism.
We have the Parser produce a token file, and the TreeParsers use that token
*.txt file as their token set. To audit, use diff of the Parser's *.txt file
with each TreeParser's *.txt file.
Another thing is a light-weight tree audit pass (one rule, one lookup table);
We implemented tree audit passes with a rule and a lookup table:
node : #(op:. { audit(op->getType(), op->numKids()); } ( node )* );
The audit() method looks up a node type in a table, and verifies that the
number of kids is correct. It can also report if there's a node that is not in
the table of NodeType->#kids map.
We also use more precise audit passes that define the contract between passes
by detecting correctly formed trees:
noMoreKids[AST root] : ( . { report failure here; } )? ;
expr : #(op:EXPRESSION value noMoreKids[#op]);
value : add | sub | mult | div | .... ;
add : #(op: ADD value value noMoreKids[#op] );
sub : #(op: SUB value value noMoreKids[#op] );
mult : #(op: MULT value value noMoreKids[#op] );
div : #(op: DIV value value noMoreKids[#op] );
// etc
Hope this helps,
- Bryan E
On 3/20/06, Koehne Kai <Kai.Koehne at student.hpi.uni-potsdam.de> wrote:
> Hi,
>
> I have used AntLR for a while now, and thought I knew most of the traps. I was wrong :-(
>
> I am working on a compiler that started as a multi-person project. We decided
> to separate the different compiler stages in different grammars: A lexer
> generates tokens, a parser constructs an AST, which all the following stages
> consume. These stages are mostly written as TreeWalkers, which all inherit
> from an abstract TreeWalker grammar describing the output of the parser ... I
> think this is the standard way to do it with AntLR.
>
> However, I just detected that the compiler fails to reject a lot of incorrect
> programs! This happens because the parser sometimes generates AST nodes that
> the TreeWalker grammar just silently ignores ... First I thought that there
> is something wrong with AntLR or with my error-handling code. But then I
> reread the documentation, and there it is:
>
> "An important thing to remember when specifying tree patterns and tree
> grammars in general is that sufficient matches are done not exact matches. As
> long as the tree satistfies the pattern, a match is reported, regardless of
> how much is left unparsed."
>
> This can be very annoying! If there is an inconsistency between the Parser
> and the TreeWalker grammar, there is very little chance to detect it.
> Instead, unknown nodes are sometimes silently ignored. Is there a way to turn
> off this feature? How do you ensure that the TreeWalker grammars really
> correspond to the AST generated by the parser?
>
> Regards,
>
> Kai Koehne
>
>
More information about the antlr-interest
mailing list