[antlr-interest] Preserving Whitespace

Fri Jun 14 12:25:36 PDT 2002

--On 14/06/2002 12:00 PM -0700 mzukowski wrote:
> I've detailed another approach before on this list which I think is
> very general and I'd love to get somebody to implement it ;) Basically
> you keep the original file around and every Token you create
> represents a region in that file (start and extent).

Hmm, interesting, I've been considering doing something similar. Currently
we use the preserve whitespace stuff (with hidden tokens) but those aren't
sufficient because of code that gets discarded due to preprocessing. I
*was* going to try to preserve the stuff that got preprocessed out as well
(theoretically possible - our lexer does the preprocessing and lexing all
the same time). However, I think now I'm going to toss that idea. 

I'm thinking instead of keeping unprocessed (raw) token streams for each
file, with character positions into the original files. Those can be
compared to file and character positions which are in the nodes in our AST.
For each node in the AST which gets printed, we'll look for the related
node in the raw token list, and check for stuff which got preprocessed out.
(As well as watch for code that got substituted in via the preprocessor.)

That should allow us to regenerate original source code, but also fiddle
with the parse tree and then re-write the code, as if it were a programmer
doing code tweaks.

I wish I'd had a clearer picture of this two years ago. :-)

John
www.joanju.com

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/