[antlr-interest] various question on tree parsing

Sat Jul 21 18:10:57 PDT 2007

Interesting ideas....
Thanks for the tip!
  ----- Original Message ----- 
  From: Hardy, Stephen 
  To: antlr-interest at antlr.org 
  Sent: Sunday, July 22, 2007 11:05 AM
  Subject: Re: [antlr-interest] various question on tree parsing

  Lloyd,

  I have a similar sort of problem.  I use a couple of solutions to rewriting the tree:

  One solution is to add new "imaginary tokens" that don't exist in the input syntax, but can be recognized by the tree grammar.  This works in the AST construction phase, i.e. the parser (not the tree parser).  You then change the tree parser to accept the new tokens and ignore the "raw" tokens.  For example, in your case of constant folding arithmetic expressions, you might add a new token called CONST_RESULT and allow expression trees to be of the form

    ^(binary_operator  expression expression (CONST_RESULT const_value)? )

  Then, you also change the AST parser to recognize the optional CONST_RESULT and use it instead of the two sub-expressions if it exists.  Note that the addition of the CONST_RESULT subtree is performed in the @after{...} block, which is also a good place to trim the original expression subtrees if desired.

  Another solution is to evaluate and rewrite the expression tree, also in the @after{...} block, using the CommonTree object methods.  This would avoid the necessity to create new imaginary tokens.  In this example, you might completely rewrite $retval.tree to be the tree which represents a constant.

  For my application, I like the first approach better, since information is simply augmented rather than being permanently lost.  The small disadvantage is that the AST parser is no longer such a close analogy to the rewrite rules in the original grammar.

  Regards,

  SJH

------------------------------------------------------------------------------

  From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Lloyd Dupont
  Sent: Saturday, July 21, 2007 4:59 PM
  To: antlr-interest at antlr.org
  Subject: [antlr-interest] various question on tree parsing

  I have finished my recognizer (i.e. parser + lexer), tested it with various input, and now I want to do something with the result.

  as I build an AST my 1st idea was to build a tree parser.

  - here comes the 1st question:

  yesterday I read the book about tree grammar, apparently tree grammar cannot output tree.

  so how can I do constant reduction with TreeGrammar?

  I mean let's say I have the following tree fragment

  ^(PLUS ID["a"] ^(PLUS INT["3"] INT["4"]))

  I would like to simplify it as

  ^(PLUS ID["a"] INT["7"])

  and while a tree grammar "seem" perfect for that, I wonder how to get around the fact it cannot output an other tree?
  (maybe modifying the input tree itself? how do I do that?)

  - 2nd:

  let's say I have this tree instead

  ^(PLUS INT["3"] ^(PLUS ID["a"] INT["4"]))

  it could benefit from the same reduction but it is less obvious, any tip on how to do that?

  - 3rd:

  I was thinking to do multiple analysis pass on the tree: variable checking, constant reduction, code generation.

  and while ANTLR book recommand to do (multiple?) tree parser I'm thinking than generating my own class of tree instead of AST and simply do the work in code with special method in my tree implementation seems easier.

  What do you think?

  Like the above question: I could call a method "simplify" on my tree and it would recursively check itself for binary expression with 2 constant and simplify them. That solve my problem that tree grammar cannot output tree.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070722/b975df26/attachment-0001.html