[antlr-interest] Parsing huge files a chunk at a time
Johannes Luber
jaluber at gmx.de
Mon Nov 10 04:20:24 PST 2008
John Woods schrieb:
> Hello,
>
> I'd like to parse files which are much larger than my target memory
> footprint. The files are composed of many small independently parsable
> chunks. I only need to parse one of these chunks at a time, dispose of
> the parsed result, then parse the next chunk.
>
> The first hurdle I find is that ANTLRReaderStream reads the entire file
> into a memory buffer upon construction. So, I thought I'd implement
> MyOwnReaderStream class which implements the CharStream interface such
> that only as many chars as look-ahead requests are read into memory
> (using a relatively small look-ahead buffer backed by a RandomAccessFile
> to support long distance marks, etc).
>
> However, it seems that even if I did that, the next hurdle would be
> CommonTokenStream.fillBuffer() which again tries to tokenize everything
> into memory even though it's not needed yet.
>
> Is there a way to only consume memory on the order of the size of the
> rule I'm attempting to parse, and not on the order of the size of the file?
>
> Thanks for any tips!
I don't know if creating MyOwnReaderStream is necessary, but the
important point is to replace CommonTokenStream.fillBuffer(). Just
derive a new class and add your own handling.
Johannes
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list