[antlr-interest] Parsing huge files a chunk at a time

Mon Nov 10 04:20:24 PST 2008

John Woods schrieb:
> Hello,
> 
> I'd like to parse files which are much larger than my target memory 
> footprint. The files are composed of many small independently parsable 
> chunks. I only need to parse one of these chunks at a time, dispose of 
> the parsed result, then parse the next chunk.
> 
> The first hurdle I find is that ANTLRReaderStream reads the entire file 
> into a memory buffer upon construction. So, I thought I'd implement 
> MyOwnReaderStream class which implements the CharStream interface such 
> that only as many chars as look-ahead requests are read into memory 
> (using a relatively small look-ahead buffer backed by a RandomAccessFile 
> to support long distance marks, etc).
> 
> However, it seems that even if I did that, the next hurdle would be 
> CommonTokenStream.fillBuffer() which again tries to tokenize everything 
> into memory even though it's not needed yet.
> 
> Is there a way to only consume memory on the order of the size of the 
> rule I'm attempting to parse, and not on the order of the size of the file?
> 
> Thanks for any tips!

I don't know if creating MyOwnReaderStream is necessary, but the
important point is to replace CommonTokenStream.fillBuffer(). Just
derive a new class and add your own handling.

Johannes
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>