[antlr-interest] Parsing huge files a chunk at a time

Thu Nov 6 18:09:29 PST 2008

Hello,

I'd like to parse files which are much larger than my target memory 
footprint. The files are composed of many small independently parsable 
chunks. I only need to parse one of these chunks at a time, dispose of 
the parsed result, then parse the next chunk.

The first hurdle I find is that ANTLRReaderStream reads the entire file 
into a memory buffer upon construction. So, I thought I'd implement 
MyOwnReaderStream class which implements the CharStream interface such 
that only as many chars as look-ahead requests are read into memory 
(using a relatively small look-ahead buffer backed by a RandomAccessFile 
to support long distance marks, etc).

However, it seems that even if I did that, the next hurdle would be 
CommonTokenStream.fillBuffer() which again tries to tokenize everything 
into memory even though it's not needed yet.

Is there a way to only consume memory on the order of the size of the 
rule I'm attempting to parse, and not on the order of the size of the file?

Thanks for any tips!