[antlr-interest] Parsing huge files a chunk at a time
John Woods
jqwoods at gmail.com
Thu Nov 6 18:09:29 PST 2008
Hello,
I'd like to parse files which are much larger than my target memory
footprint. The files are composed of many small independently parsable
chunks. I only need to parse one of these chunks at a time, dispose of
the parsed result, then parse the next chunk.
The first hurdle I find is that ANTLRReaderStream reads the entire file
into a memory buffer upon construction. So, I thought I'd implement
MyOwnReaderStream class which implements the CharStream interface such
that only as many chars as look-ahead requests are read into memory
(using a relatively small look-ahead buffer backed by a RandomAccessFile
to support long distance marks, etc).
However, it seems that even if I did that, the next hurdle would be
CommonTokenStream.fillBuffer() which again tries to tokenize everything
into memory even though it's not needed yet.
Is there a way to only consume memory on the order of the size of the
rule I'm attempting to parse, and not on the order of the size of the file?
Thanks for any tips!
More information about the antlr-interest
mailing list