[antlr-interest] Out of Memory
Gordon.Tyler at quest.com
Mon Oct 5 07:04:27 PDT 2009
Bear in mind that the on-disk representation of a data structure is most likely more efficient than the in-memory representation. So if you intend to load the entirety of your 1GB file into memory, you will probably be using more than 1GB of memory. This can be a problem on Win32 systems where it becomes difficult to allocate more than ~1.4GB of memory to the Java heap.
Unless there is some funky options in ANTLR which allows you to throw away parts of the parsed data as you've processed it, it sounds like you might need a hand-written stream processor.
You could also try just generating and using an ANTLR lexer in combination with a stream processor. Since it generates a stream of tokens, it shouldn't have the same difficulties of needing to maintain parsed data structures in memory, while still doing some of the grunt-work required to process the file. Thus your hand-written parser should have better control over what stays in memory.
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Mark Boylan
Sent: October 5, 2009 8:17 AM
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Out of Memory
It feels a little redundant, but I think that is the right solution.
On Mon, Oct 5, 2009 at 4:18 AM, Indhu Bharathi <indhu.b at s7software.com> wrote:
> Is it possible to write a separate program to break the PGN files into
> separate games and pass each game to the lexer/parser? That will be a simple
> solution assuming there is an easy way to split games in a PGN file.
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Mark Boylan
> Sent: Monday, October 05, 2009 4:49 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Out of Memory
> Answering my own question :)
> I wrote a class named ANTLRMemoryMappedFileStream which is nearly a
> complete ripoff of ANTLRStringStream, except backed by a memory-mapped
> file. That solved the buffering problem. But now, I get an
> OutOfMemoryError in the parser with my big test file.
> I'm parsing chess games in PGN (portable game notation) format. A game
> in PGN format is usually under 1k, but a PGN file can contain many
> games. Most PGN files have several thousand games and those are no
> problem for Antlr and my grammar. But, PGN files with a million or
> more games are not rare -- especially in the case where a database
> user wants to restore an entire collection, or move it to a new chess
> database management program (like the one I'm working on). So, it's
> important for me to be able to do parse these huge files.
> I'm wondering if it's possible for the Parser to notify the Stream
> that a game has been parsed. At that point, the Stream implementation
> can flip the buffer. Does that sound like something that might work?
> Is that possible?
> On Sun, Oct 4, 2009 at 4:07 AM, Mark Boylan <boylan.mark at gmail.com> wrote:
>> My grammar is working really well with smaller test files, but I run
>> out of heap space on large files. Unfortunately, my users will expect
>> to be able to load pretty big files occasionally (~1GB).
>> Looking at the code documentation for the Antlr3 stream classes, it
>> looks like they copy the entire stream. I'm thinking that I need to
>> write a custom implementation of IntStream or CharStream that buffers
>> the input. Is that the right way to solve this? Can someone point me
>> in the right direction?
>> - mark
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
More information about the antlr-interest