[antlr-interest] Preserving Whitespace

mzukowski at yci.com mzukowski at yci.com
Fri Jun 14 12:00:11 PDT 2002


> I'm doing something along the lines of the "Preserve whitespace" example
> from ANTLR distribution. Take the AST, and write it back with small
changes.
> The example is very useful, but ... in practice you don't have only tokens
> ignored before the parser, you also ignore some in parser rules. For
> instance, in the "Preserve whitespace" example one would normally ignore
> SEMI while constructing the AST:
> 
> stat : ID ASSIGN^ expr SEMI! ; // see the ! after SEMI
> 
> Now SEMI is not present in the AST, and I don't know how can I retrieve
> hidden tokens before/after SEMI. In an ideal world, I would like to have
> SEMI treated exactly as it was ignored before the parser, just like an
other
> hidden token. But it seems that SEMI is lost after the parser, together
with
> its hidden tokens.
> 
> If I'm wrong, please correct me, it will save me a lot of time.

Have a look at TokenStreamHiddenTokenFilter.java.  If it is as simple
as knowing that all SEMIs are going to be dropped by the parser then
you could a new class of tokens which should be passed to the parser
but also preserved in the hidden token stream.  But that's a hack and
won't work for things that are only sometimes preserved.

I've detailed another approach before on this list which I think is
very general and I'd love to get somebody to implement it ;) Basically
you keep the original file around and every Token you create
represents a region in that file (start and extent).  When it comes
time to print out your Tokens you keep track of the previously printed
token and if some whitespace existed between those two tokens
previously then copy it to the output. There are some boundary cases
to handle as well as what to do with synthesized tokens that weren't
present in the original code.  If you look at the GCC grammar you can
see my first stab at it.  I gave up because of some reported bugs with
it, but was never convinced that code was causing it, so I just
commented out the code instead of deleting it.  I was actually trying
to handle nested #line directives myself with that code instead of
preserving them as whitespace too.

Monty
http://www.codetransform.com/gcc.html

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list