[antlr-interest] parsing a very large file
Loring Craymer
lgcraymer at yahoo.com
Thu Mar 26 14:15:17 PDT 2009
Memory mapping the file would probably help, as would the incremental lexing that Gavin suggests.
--Loring
----- Original Message ----
> From: Gavin Lambert <antlr at mirality.co.nz>
> To: Vladimir Konrad <vladimir at ok2home.net>; antlr-interest at antlr.org
> Sent: Thursday, March 26, 2009 2:06:11 PM
> Subject: Re: [antlr-interest] parsing a very large file
>
> At 05:10 27/03/2009, Vladimir Konrad wrote:
> >
> >I have read in the ANTLR book (from pragmatic book-store) that
> >ANTLR always loads entire file/stream into memory. Is this
> >still the case?
>
> Yes, by default.
>
> >I would need to load a data file which is quite large (100MB+)
> but
> >parsing it with ANTLR uses over 1GB of RAM. Is there any way to
> >use ANTLR to load such a data file without so big memory
> consumption?
> >If not what other (java) options are there?
>
> Have a look in the Wiki. I believe that there's some info there
> regarding overriding the token stream to not preload all the
> tokens; that will keep the initial memory budget down.
>
> However I don't think there's any way to avoid having it load
> everything in simultaneously at some point -- the tokens don't
> actually copy the data, they just hold references to its position
> in the input stream (and of course even if they did copy it that'd
> still end up taking up just as much memory). You certainly
> wouldn't be able to produce an AST without having the entire input
> file in memory and tokenised.
>
> Is there some way you can split up the input externally from ANTLR
> first? I've used it before to parse some large data files
> (~80MB), but they were archive files that contained chunks of
> about ~500KB each that could be parsed independently of each
> other, so it was fairly straightforward.
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list