[antlr-interest] Want to write a fairly simple syntax converter...
Johannes Luber
jaluber at gmx.de
Fri Apr 11 11:20:24 PDT 2008
Peter Nann schrieb:
>
> I am new to all this language parsing, and I am struggling to understand
> 'how much I need to understand' (to use a Rumsfeld'ism)
>
> If I just want to write a fairly simple converter, and keep whitespace
> fairly intact, how 'dirty' do I have to get my hands, language parsing
> and code wise?
>
> To clarify, I want to convert a proprietary format into equivalent XML.
> Something like "x [ a b c ]" -> "<rule name=x> <one-of> a b c
> </one-of> </rule>"
> (But obviously, it gets a little more complicated than that)
Please don't use this kind of XML. It is possible to create a schema
which says that an element includes a whitespace separated list, but
everyone agrees that it is simpler to work with:
"<rule name=x> <one-of> <elem>a</elem> <elem>b</elem> <elem>c</elem>
</one-of> </rule>"
More verbose yes, but XML wasn't designed with terseness in mind.
> My 2 biggest questions:
> 1) Do I need to worry about 'building trees', accessing the AST or
> anything like that? Or are the 'snippets' of code you can put in the
> grammar rules going to get me by?
If you want to put out the parse tree in the same shape as it is and
with no extra computation, then I doubt that a tree grammar is necessary
for you.
> 2) Is maintaining whitespace easily do-able? It seems to get gobbled up
> with little opportunity to keep it intact. It seems I could maybe
> tokenize it explicitly as meaningful input, and then be able to simply
> re-constitute it in the output, or is that just crazy talk and will
> complicate my grammars too much (with 'WS?' sprinkled everywhere…)
What do you need to retain th whitespace for? XML ignores big parts of
it anyway and with a new file format you don't have to follow the
conventions laid down by your predecessors. Better to ignore the
original whitespace here.
Johannes
More information about the antlr-interest
mailing list