[antlr-interest] various question on tree parsing

Hardy, Stephen Stephen_Hardy at rabbit.com
Sat Jul 21 18:05:38 PDT 2007


Lloyd,

 

I have a similar sort of problem.  I use a couple of solutions to
rewriting the tree:

 

One solution is to add new "imaginary tokens" that don't exist in the
input syntax, but can be recognized by the tree grammar.  This works in
the AST construction phase, i.e. the parser (not the tree parser).  You
then change the tree parser to accept the new tokens and ignore the
"raw" tokens.  For example, in your case of constant folding arithmetic
expressions, you might add a new token called CONST_RESULT and allow
expression trees to be of the form

 

  ^(binary_operator  expression expression (CONST_RESULT const_value)? )

 

Then, you also change the AST parser to recognize the optional
CONST_RESULT and use it instead of the two sub-expressions if it exists.
Note that the addition of the CONST_RESULT subtree is performed in the
@after{...} block, which is also a good place to trim the original
expression subtrees if desired.

 

Another solution is to evaluate and rewrite the expression tree, also in
the @after{...} block, using the CommonTree object methods.  This would
avoid the necessity to create new imaginary tokens.  In this example,
you might completely rewrite $retval.tree to be the tree which
represents a constant.

 

For my application, I like the first approach better, since information
is simply augmented rather than being permanently lost.  The small
disadvantage is that the AST parser is no longer such a close analogy to
the rewrite rules in the original grammar.

 

Regards,

SJH

 

 

________________________________

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Lloyd Dupont
Sent: Saturday, July 21, 2007 4:59 PM
To: antlr-interest at antlr.org
Subject: [antlr-interest] various question on tree parsing

 

I have finished my recognizer (i.e. parser + lexer), tested it with
various input, and now I want to do something with the result.

as I build an AST my 1st idea was to build a tree parser.

 

- here comes the 1st question:

 

yesterday I read the book about tree grammar, apparently tree grammar
cannot output tree.

so how can I do constant reduction with TreeGrammar?

I mean let's say I have the following tree fragment

 

^(PLUS ID["a"] ^(PLUS INT["3"] INT["4"]))

 

I would like to simplify it as

^(PLUS ID["a"] INT["7"])

 

and while a tree grammar "seem" perfect for that, I wonder how to get
around the fact it cannot output an other tree?
(maybe modifying the input tree itself? how do I do that?)

 

 

- 2nd:

let's say I have this tree instead

^(PLUS INT["3"] ^(PLUS ID["a"] INT["4"]))

 

it could benefit from the same reduction but it is less obvious, any tip
on how to do that?

 

 

- 3rd:

I was thinking to do multiple analysis pass on the tree: variable
checking, constant reduction, code generation.

and while ANTLR book recommand to do (multiple?) tree parser I'm
thinking than generating my own class of tree instead of AST and simply
do the work in code with special method in my tree implementation seems
easier.

What do you think?

 

Like the above question: I could call a method "simplify" on my tree and
it would recursively check itself for binary expression with 2 constant
and simplify them. That solve my problem that tree grammar cannot output
tree.

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070721/ffcdab31/attachment.html 


More information about the antlr-interest mailing list