[antlr-interest] Fundamental tree parsing question

Tue Jul 10 11:45:56 PDT 2007

Hello,

Ted Villalba wrote:

> I'm stuck trying to figure out how to distinguish between multi-term
> values in a syntax tree.
> For instance, the following rule rewrites to a simple tree:
> 
> field    : tag '=' LPAREN value RPAREN -> ^('=' tag value)
> 
> Here the root node is '=', and the children are both tag and value. The
> problem is, if tag has multiple tokens, and value is multiple tokens,
> there
> is no way (that I know of yet) to determine where 'tag'  stops and 'value'
> starts.
> So something like :
> TAG A=(THE TREE TEST)
> 
> Will give you:
> 
> ^(= TAG A THE TREE TEST)
> 
> If I want to reference the value for tag at this point, I don't know how.
> 
> This would seem a basic problem, but I haven't found any similar examples
> in the literature. Has anyone run into this issue and how did you resolve
> it?

You could introduce imaginary token types TAGS and VALUES (one of those
would be sufficient, but for symmetry...). Then use

field: tag '=' LPAREN value RPAREN -> ^('=' ^(TAGS tag) ^(VALUES value))

(or make tag and value return trees rooted by TAGS/VALUES).

HTH

-- 
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/