[antlr-interest] big XML file support

Mon May 15 16:01:22 PDT 2006

Brannon King wrote:
> Suppose I have a file that looks like this:
> 
> <a>
>   <b>
>     <c>
>       <d /> <d /> <d /> ... For a few GB worth
>     </c>
>     <c binary="true">
> <CDATA[[ about 10GB of binary data ]]>
>     </c>
>   </b>
> </a>
> 
> I need a parser to go through and build up a structure with the tree but
> without any <d> or binary data. Instead, I just want to record the file
> locations for those and I'll go pull them from the file when I need them. Is
> ANTLR a good tool to do that or am I better off parsing by hand? Or should I
> use Xerces? Or, the real question, does ANTLR have some ability to do
> XML-type structures easily? What are the largest files you've parsed using
> ANTLR? I'm using C++. Thanks for your time.

I know I'm sidestepping the issue, but it might be worth using a
dedicated XML parses. XERCES-C++ from Apache is probably a good bet for
you, and it supports SAX, DOM, and DOMasSAX, IIRC. Otherwise it's easy
to write a DOMasSAX layer yourself...

Sam