[antlr-interest] parsing a very large file

Loring Craymer lgcraymer at yahoo.com
Thu Mar 26 14:15:17 PDT 2009


Memory mapping the file would probably help, as would the incremental lexing that Gavin suggests.

--Loring



----- Original Message ----
> From: Gavin Lambert <antlr at mirality.co.nz>
> To: Vladimir Konrad <vladimir at ok2home.net>; antlr-interest at antlr.org
> Sent: Thursday, March 26, 2009 2:06:11 PM
> Subject: Re: [antlr-interest] parsing a very large file
> 
> At 05:10 27/03/2009, Vladimir Konrad wrote:
> >
> >I have read in the ANTLR book (from pragmatic book-store) that
> >ANTLR always loads entire file/stream into memory. Is this
> >still the case?
> 
> Yes, by default.
> 
> >I would need to load a data file which is quite large (100MB+) 
> but
> >parsing it with ANTLR uses over 1GB of RAM. Is there any way to
> >use ANTLR to load such a data file without so big memory 
> consumption?
> >If not what other (java) options are there?
> 
> Have a look in the Wiki.  I believe that there's some info there 
> regarding overriding the token stream to not preload all the 
> tokens; that will keep the initial memory budget down.
> 
> However I don't think there's any way to avoid having it load 
> everything in simultaneously at some point -- the tokens don't 
> actually copy the data, they just hold references to its position 
> in the input stream (and of course even if they did copy it that'd 
> still end up taking up just as much memory).  You certainly 
> wouldn't be able to produce an AST without having the entire input 
> file in memory and tokenised.
> 
> Is there some way you can split up the input externally from ANTLR 
> first?  I've used it before to parse some large data files 
> (~80MB), but they were archive files that contained chunks of 
> about ~500KB each that could be parsed independently of each 
> other, so it was fairly straightforward.
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address



      


More information about the antlr-interest mailing list