[antlr-interest] Adding imaginary tokens to the TokenStream in the PARSER? (Bad idea)

David Holroyd dave at badgers-in-foil.co.uk
Tue Jul 31 01:19:15 PDT 2007


On Mon, Jul 30, 2007 at 10:59:27AM -0600, Susan Jolly wrote:
> OK, I've now answered my own question in the negative. One does NOT want to
> actually insert new tokens into an existing TokenStream because the
> TokenStream is a List, not a LinkedList, and each Token keeps track of its
> index into this list.  If you were to insert a new Token somewhere in the
> middle of the List, you'd have to reset all of the indices for Tokens
> following the insert.

I don't know if you've already got another approach worked out, but I've
implemented something that sounds very close to what you're talking
about myself.  i.e. tokens as a linked list, plus 'special' tokens
added by the parser to mark insertion points in the token stream (I
gave them the name 'placeholder' tokens).

My use-case starts with a desire to support format-preserving-
refactoring, and an AST which contains imaginary nodes,

  (CLASS_DEF
    ANNOTATIONS  <- imaginary
    MODIFIERS  <- imaginary
    IDENT
    (TYPE_BLOCK ...)
  )

but where some of the imaginary nodes may have no children (i.e. a class
may not actually have any annotations or modifier-keywords).  After tree
construction, the application processes the AST, and may want to do
something like add the 'public' keyword to the class def.

So, after the parser executes the 'modifiers' rule (which builds the
imaginary MODIFIERS AST node) I run an action which inserts a
PLACEHOLDER token into the token stream if-and-only-if the MODIFIERS
node lacks any real children.  The start/stop tokens for the MODIFIERS
node are then pointed at this PLACEHOLDER token (otherwise, ANTLR seems
to point start/stop at a 'nearby' token in the stream).

Then, when the application comes to add a PUBLIC child to the MODIFIERS
node, a special case will notice that a PLACEHOLDER token is present,
and the placeholder will be replaced in the token stream with the
token(s) that belong to the PUBLIC node.

If there was no PLACEHOLDER token added to the token stream, it would be
very difficult for the application code updating the AST to know exactly
where in the token stream to insert new tokens when a new child is added
to a previously empty imaginary AST node.


Does that make any sense? :)


The grammar in question is,

  http://svn.badgers-in-foil.co.uk/metaas/trunk/src/main/antlr/org/asdt/core/internal/antlr/AS3.g3

(i.e. see the placeholder() function defined near the top)


I'm not convinced that I've got the best approach though, since I'm
forever finding (and creating) bugs.


hope it helps, anyway,
dave

-- 
http://david.holroyd.me.uk/


More information about the antlr-interest mailing list