[antlr-interest] Trying to keep whitespace in an AST

Fri Feb 8 14:44:17 PST 2008

On Fri, Feb 08, 2008 at 10:22:37AM -0800, Jim Idle wrote:
> Now, when you walk you AST and find a method, you just need the token 
> index of the start sequence of your method declaration (this of course 
> depends on the language). Then you can traverse backwards in the token 
> stream (the stream you passed to the parser, mostly CommonTokenStream) 
> for that index, and pick up any off-channel tokens that were ignored by 
> the parser.

I've implemented this approach, more or less, but with additional effort
to take into account changes to the AST.

What made most sense to me was to alter the token stream to account for
the AST changes.  i.e. if I remove a method, I delete from the stream
between child.start and child.stop tokens inclusive.

Of course, CommonTree nodes store start/stop *indexes*, which will leave
lots of AST nodes with invalid indexes if was actually change the
underlying token-stream array.

My solution was to switch the token-stream from an array to a
doubly-linked list, and to replace start/stop indexes with references to
the actual token objects.  Now if a subtree is deleted, I can unlink the
appropriate sublist from the token-stream, and all remaining start/stop
references in the AST will still be valid.

It's quite a lot of effort to maintain that list of tokens, and some
things aren't as nice as I'd wish, but I can happily refactor AST
subtrees while maintaining the formatting of the rest of the file.
That includes keeping doc-comments connected to methods, etc.

Code here,

  http://svn.badgers-in-foil.co.uk/metaas/trunk/src/main/java/uk/co/badgersinfoil/metaas/impl/antlr/

ta,
dave

-- 
http://david.holroyd.me.uk/