[antlr-interest] Parsing RTF to Braille

Tue Jun 26 06:27:31 PDT 2007

Hi Tim,

I agree that it's a worthwhile (and probably also difficult) project. We plan do use the RTF->HBS-Parser at the (Distance) University of Hagen to process the teaching documents for the visually impaired people.

You convinced me that constructing an AST is a good idea; At this point I'm not sure, whether templates will be necessary.

Thanks for the tip concerning the 'helper' objects, that seems very reasonable.

Regards,
Daniel.

-----Original Message-----
From: Tim Clark [mailto:timgclark at gmail.com]
Sent: Tuesday, June 26, 2007 9:47 AM
To: Daniel Warner
Subject: Re: [antlr-interest] Parsing RTF to Braille

Hi Daniel
This sounds like a really worthwhile (and difficult) project. My experience tells me that you will get a lot of benefits from using (at least) three parts to your solution:

1. src ---> AST
2. at least one AST grammar (i.e. tree parser) to validate, extract useful info etc.
3. AST --> output using templates.

The last step might not be the final output, but rather a simplified language that represents what you want the final output to be. Then you repeat the parse,analyse,output on that language! The advantage of this is that it is much easier to check the structure and validity of a simpler language.
Also, using one or more 'helper' objects outside of the parsers, you will find that the grammar files are quite simple and not cluttered up with lots of actions (and therefore easier to understand and debug).

[My application is a simulation engine that runs within a Java application. It's source code parser/AST/template step produces text 'assembler code' for an abstract stack machine; this is then processed to produce binary code for the stack machine.]

Divide and rule!
Regards,Tim

On 6/25/07, Daniel Warner <dwarner at uni-paderborn.de> wrote:
Hello,

I'm studying computer sciences and mathematics in Paderborn, and currently I'm working on a university project with the goal to transform RTF-documents into a text-based representation of braille, called HBS. 

The output format HBS is already specified, although there is no grammar (it is all in an existing application and I have to reengineer). Lots of information in a RTF-document is of course irrelevant for blind people and will therefore have to be eliminated. HBS codes a lot of structural information but far less layout information, so I will also be faced with problems such as: What do I do with footnotes, how should I represent text that is colored red (maybe even inconsistently), how can I map layout to structure appropriately, etc. just to mention a few. 

As I want to implement the RTF-HBS-Parser in Java, I naturally looked for parser generators for this language. To me ANTLR v3 seems to be the most promising approach in this area, and I really appreciate Prof. Parr for publishing his tool under the BSD License. 

I already baught his book "The Definitive ANTLR Reference" (and PDF) and have a question concerning the "big picture" for my project:

The RTF specification 1.9 from Microsoft is huge. What approach would you suggest in parsing RTF with ANTLR to the mentioned text-based representation of braille (HBS)? 

1) Use actions in the grammar rules?
2) Create an AST from the RTF input and a tree grammar for the AST that outputs HBS?
3) Use templates?
4) Other suggestions?

Thanks a lot in advance for some hints that help me starting off with my work, 

Daniel Warner

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.476 / Virus Database: 269.9.6/865 - Release Date: 24.06.2007 08:33
No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.476 / Virus Database: 269.9.8/869 - Release Date: 25.06.2007 17:32