[antlr-interest] parsing a very large file
Gavin Lambert
antlr at mirality.co.nz
Thu Mar 26 14:06:11 PDT 2009
At 05:10 27/03/2009, Vladimir Konrad wrote:
>
>I have read in the ANTLR book (from pragmatic book-store) that
>ANTLR always loads entire file/stream into memory. Is this
>still the case?
Yes, by default.
>I would need to load a data file which is quite large (100MB+)
but
>parsing it with ANTLR uses over 1GB of RAM. Is there any way to
>use ANTLR to load such a data file without so big memory
consumption?
>If not what other (java) options are there?
Have a look in the Wiki. I believe that there's some info there
regarding overriding the token stream to not preload all the
tokens; that will keep the initial memory budget down.
However I don't think there's any way to avoid having it load
everything in simultaneously at some point -- the tokens don't
actually copy the data, they just hold references to its position
in the input stream (and of course even if they did copy it that'd
still end up taking up just as much memory). You certainly
wouldn't be able to produce an AST without having the entire input
file in memory and tokenised.
Is there some way you can split up the input externally from ANTLR
first? I've used it before to parse some large data files
(~80MB), but they were archive files that contained chunks of
about ~500KB each that could be parsed independently of each
other, so it was fairly straightforward.
More information about the antlr-interest
mailing list