[antlr-interest] How does AST construction work?

Mon May 16 10:20:56 PDT 2005

Try removing the "^" from your example and look at the results.  This
should help...

Be default, ANTLR builds a list of all elements of a production - not a tree.

Each token that is matched in a grammar will be appended to a list,
then that list will be returned by the production.

You can override the default by using the "^" operator to make that
node the parent of the *current* list, and the "!" operator to prevent
something from being added to that list.  I emphasize current above
because it's possible to see more than one "^" operator in a
production - hard to grok, but sometimes very handy.

This model allows, for example, something like this:

   /*
    * result is #( LEFT_CURLY E1 E1 E1 E1 )
    */
   exprlist:  LEFT_CURLY^ expr_comma_list RIGHT_CURLY! ;
   expr_comma_list: expr ( COMMA! expr )* ;
   expr: E1;

to produce a single node representing the exprlist that contains every
expression matched by expr.  Generating lists rather than trees
isolates the shape of the resultant AST from the productions used to
match it.

The single-digit match you are seeing as a tree (e.g., in expr, above)
is explained away because it is constructing a a list with one element
-- which happens to work as if it were a tree.

> As expected. But how does this happen? Only PLUS and STAR have AST
> annotations - how does ANTLR decide where to put 'mexpr' and 'atom' in
> the tree? And if I just enter a single digit as an 'expr' then the tree
> (of course) only has one entry - but is it a root and, if so, how?