[antlr-interest] Retaining comments

Tue Mar 11 12:45:47 PDT 2008

OK, I'm going to have to do this as well. However, my dream would be....

Can we use/generate an XML AST, with the text nodes corresponding exactly to
the input source received at the lexer, and the elements corresponding to
the AST tags. I know there are all sorts of complexities with this, but it
enables several outcomes:

1. Using fast and general tree processing via XML and DOM, maybe even using
XPath and XQuery
2. Easy filtering via the above
3. Clear mapping between AST and text, which is not currently easy

Although I have not completely looked into this yet (and I will) it seems
most of this could be done fine using an additional AST writer. I wrote one
which does the XML, but does not preserve the input text. In the end, I had
to do this, as the current AST notation (which I wanted to read for
processing) was unable to distinguish, say, between an imaginary token
"FUNCTION" and a language identifier written as uppercase "FUNCTION", unless
I tagged absolutely every single thing in the grammar, which was tedious.
There are all sorts of other nasty cases (e.g., does whitespace fall inside
or outside of particular elements). And in particular, this would require
some mapping between imaginary tokens and text positions which is not always
possible. 

I'm developing a system which will annotate code, both generating
human-readable output and a component index. The one pushes you to a text
output, the other to an AST - I've ended up needing both, largely because of
similar issues. It seems it may be fairly simple to develop this kind of
tree writer for cases like these. 

Any thoughts on this? Am I crazy/doing it all wrong?

--S

-----Original Message-----
From: Terence Parr [mailto:parrt at cs.usfca.edu]
Sent: Tuesday, March 11, 2008 12:43 PM
To: bmeike at speakeasy.net
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Retaining comments

send comments to parser on different channel.  then lookin token buffer for
them between "real" tokens. 
Ter

On Feb 27, 2008, at 1:19 PM, < bmeike at speakeasy.net
<mailto:bmeike at speakeasy.net> > < bmeike at speakeasy.net
<mailto:bmeike at speakeasy.net> > wrote:

On Wed Feb 27 12:29 , Gavin Lambert sent:

> This will keep the comment tokens in the token stream at the 
> appropriate points. To transfer them you'll have to add some code 
> that looks for comment tokens nearby recognised parser constructs 
> so you can emit them at the right place in the output.

Sound great.  What do you mean by "looks for comment tokens".  As far as I
can tell, the parser only sees the DEFAULT channel.  Where do I look, to
find nearby tokens?

Thanks!
  Blake Meike

-- 
This message was scanned by ESVA and is believed to be clean. 
Click
<http://antispam.infobal.com/cgi-bin/learn-msg.cgi?id=3F8A7299B6.2E2B3> here
to report this message as spam. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080311/1f650eca/attachment-0001.html