[antlr-interest] Retaining comments
Loring Craymer
lgcraymer at yahoo.com
Tue Mar 11 22:41:53 PDT 2008
You can do XML and DOM--ANTLR 2 had an AST serializer built in--but there is not much point to doing so other than that you have some familiarity with the tools. For any vertical translation problem (one language to translate), ANTLR will be faster (XML processing is _slow_ from a machine perspective), more powerful, and easier to use if you learn how to use ANTLR effectively. There are horizontal problems--extracting information from a collection of trees generated by different source languages and different translators--for which XML is usable, but again this is not the way to go if you are comfortable with language processing technology.
The value of XML is that it is an agreed upon format for structured text that is portable and can be adapted for general information retrieval ("the semantic web")--or at least has that as a hoped for goal. It is not a technology for language processing; indeed, the XML community seems to be almost allergic to language processing technology. "Everything is a tree" does not remove the need for grammars--the XML community calls them "schema" and writes applications in XSLT to convert from one schema to another without intermediate analysis.
You might also take a look at Ter's rant on XML, http://www.ibm.com/developerworks/xml/library/x-sbxml.html.
--Loring
----- Original Message ----
From: Stuart Watt <SWatt at infobal.com>
To: Terence Parr <parrt at cs.usfca.edu>; bmeike at speakeasy.net
Cc: antlr-interest at antlr.org
Sent: Tuesday, March 11, 2008 12:45:47 PM
Subject: Re: [antlr-interest] Retaining comments
OK,
I'm going to have to do this as well. However, my dream would
be....
Can we
use/generate an XML AST, with the text nodes corresponding exactly to the input
source received at the lexer, and the elements corresponding to the AST tags. I
know there are all sorts of complexities with this, but it enables several
outcomes:
1.
Using fast and general tree processing via XML and DOM, maybe even using XPath
and XQuery
2.
Easy filtering via the above
3.
Clear mapping between AST and text, which is not currently
easy
Although I have not completely looked into this yet (and I will) it seems
most of this could be done fine using an additional AST writer. I wrote one
which does the XML, but does not preserve the input text. In the end, I had to
do this, as the current AST notation (which I wanted to read for processing) was
unable to distinguish, say, between an imaginary token "FUNCTION" and a
language identifier written as uppercase "FUNCTION", unless I tagged absolutely
every single thing in the grammar, which was tedious. There are all sorts of
other nasty cases (e.g., does whitespace fall inside or outside of particular
elements). And in particular, this would require some mapping between imaginary
tokens and text positions which is not always possible.
I'm
developing a system which will annotate code, both generating human-readable
output and a component index. The one pushes you to a text output, the other to
an AST - I've ended up needing both, largely because of similar issues. It seems
it may be fairly simple to develop this kind of tree writer for cases like
these.
Any
thoughts on this? Am I crazy/doing it all wrong?
--S
-----Original Message-----
From: Terence Parr [mailto:parrt at cs.usfca.edu]
Sent: Tuesday, March 11, 2008 12:43 PM
To: bmeike at speakeasy.net
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Retaining comments
send comments to parser on different channel. then lookin token buffer for them between "real" tokens. Ter
On Feb 27, 2008, at 1:19 PM, <bmeike at speakeasy.net> <bmeike at speakeasy.net> wrote:
On
Wed Feb 27 12:29 , Gavin Lambert
sent:
> This will keep the comment tokens in the token stream at the
> appropriate points. To transfer them you'll have to add some code
> that looks for comment tokens nearby recognised parser constructs
> so you can emit them at the right place in the output.
Sound great. What do you mean by "looks for comment tokens". As far as I can tell, the parser only sees the DEFAULT channel. Where do I look, to find nearby tokens?
Thanks!
Blake Meike
--
This message was scanned by ESVA and is believed to be clean.
Click
here to report this message as spam.
____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080311/40438ac2/attachment-0001.html
More information about the antlr-interest
mailing list