[antlr-interest] Parsing RTF to Braille

Gerald B. Rosenberg gbr at newtechlaw.com
Mon Jun 25 12:33:10 PDT 2007


At 11:46 AM 6/25/2007, Daniel Warner wrote:
>The RTF specification 1.9 from Microsoft is huge. What approach 
>would you suggest in parsing RTF with ANTLR to the mentioned 
>text-based representation of braille (HBS)?
>
>1) Use actions in the grammar rules?
>2) Create an AST from the RTF input and a tree grammar for the AST 
>that outputs HBS?
>3) Use templates?
>4) Other suggestions?

I would strongly suggest implementing a PDF to HBS converter if only 
to avoid the many different/incomplete interpretations of the RTF 
spec.  The PDF spec is substantially smaller and far more uniformly 
implemented.  Conversion from RTF, and many other document formats, 
to PDF can be automated with little difficulty.

In both RTF and PDF, top down left right orientation is common, but 
not required.  Therefore, you can encounter "out of order" text even 
on simple pages.  So, actions for direct output are not likely to be 
useful.  If there is a need to handle footnotes, tables, and columns, 
AST is the only way to go.  You will likely need multiple tree 
walkers to distinguish different text blocks and reorganize the AST 
content into a reasonably consistent form.  Output then should be 
fairly linear, so templates should not be necessary.

HTHs,
Gerald
----
Gerald B. Rosenberg, Esq.
NewTechLaw
260 Sheridan Ave., Suite 208
Palo Alto, CA  94306-2009

650.325.2100  (office)  /  650.703.1724  (cell)
650.325.2107  (facsimile)

www.newtechlaw.com


CONFIDENTIALITY NOTICE:  This email message (including any 
attachments) is being sent by an attorney, is for the sole use of the 
intended recipient, and may contain confidential and privileged 
information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, 
please contact the sender immediately by reply email and delete all 
copies of this message and any attachments without retaining a copy.  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070625/534842b4/attachment.html 


More information about the antlr-interest mailing list