[antlr-interest] Want to write a fairly simple syntax converter...

Fri Apr 11 11:20:24 PDT 2008

Peter Nann schrieb:
> 
> I am new to all this language parsing, and I am struggling to understand 
> 'how much I need to understand' (to use a Rumsfeld'ism)
> 
> If I just want to write a fairly simple converter, and keep whitespace 
> fairly intact, how 'dirty' do I have to get my hands, language parsing 
> and code wise?
> 
> To clarify, I want to convert a proprietary format into equivalent XML.
> Something like "x [ a b c ]"   ->  "<rule name=x> <one-of> a b c 
> </one-of> </rule>"
> (But obviously, it gets a little more complicated than that)

Please don't use this kind of XML. It is possible to create a schema 
which says that an element includes a whitespace separated list, but 
everyone agrees that it is simpler to work with:

"<rule name=x> <one-of> <elem>a</elem> <elem>b</elem> <elem>c</elem> 
</one-of> </rule>"

More verbose yes, but XML wasn't designed with terseness in mind.

> My 2 biggest questions:
> 1) Do I need to worry about 'building trees', accessing the AST or 
> anything like that? Or are the 'snippets' of code you can put in the 
> grammar rules going to get me by?

If you want to put out the parse tree in the same shape as it is and 
with no extra computation, then I doubt that a tree grammar is necessary 
for you.

> 2) Is maintaining whitespace easily do-able? It seems to get gobbled up 
> with little opportunity to keep it intact. It seems I could maybe 
> tokenize it explicitly as meaningful input, and then be able to simply 
> re-constitute it in the output, or is that just crazy talk and will 
> complicate my grammars too much (with 'WS?' sprinkled everywhere…)

What do you need to retain th whitespace for? XML ignores big parts of 
it anyway and with a new file format you don't have to follow the 
conventions laid down by your predecessors. Better to ignore the 
original whitespace here.

Johannes