[antlr-interest] Out of Memory

Mon Oct 5 16:37:05 PDT 2009

On Mon, Oct 5, 2009 at 7:16 AM, Mark Boylan <boylan.mark at gmail.com> wrote:
> It feels a little redundant, but I think that is the right solution.
>

Assuming you have something like: game_list: game*;

Couldn't you do that in the lexer/parser?  Just don't match EOF on the
start rule?  That you can just have something like:

parser.game_prefix();
while (game_or_end_return = parser.game_or_end()) {
// Process game here
// make sure it you didn't hit the end case here.
}

That might not make the lexer dump everything, but I thought that
would get the parser to not have everything in there.

I haven't actually tried it with huge amounts of memory, but I thought
that was the way it worked.  Maybe save where you are at in the file
when that finishes and create a new lexer and seek to the proper place
in the file if it doesn't actually clean up after itself.  I think
you're correct, it's silly to write something smart enough to split
the files by hand, when you have a grammar capable of dealing with it.

Kirby

>
> On Mon, Oct 5, 2009 at 4:18 AM, Indhu Bharathi <indhu.b at s7software.com> wrote:
>> Is it possible to write a separate program to break the PGN files into
>> separate games and pass each game to the lexer/parser? That will be a simple
>> solution assuming there is an easy way to split games in a PGN file.
>>
>>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org
>> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Mark Boylan
>> Sent: Monday, October 05, 2009 4:49 AM
>> To: antlr-interest at antlr.org
>> Subject: Re: [antlr-interest] Out of Memory
>>
>> Answering my own question :)
>>
>> I wrote a class named ANTLRMemoryMappedFileStream which is nearly a
>> complete ripoff of ANTLRStringStream, except backed by a memory-mapped
>> file. That solved the buffering problem. But now, I get an
>> OutOfMemoryError in the parser with my big test file.
>>
>> I'm parsing chess games in PGN (portable game notation) format. A game
>> in PGN format is usually under 1k, but a PGN file can contain many
>> games. Most PGN files have several thousand games and those are no
>> problem for Antlr and my grammar. But, PGN files with a million or
>> more games are not rare -- especially in the case where a database
>> user wants to restore an entire collection, or move it to a new chess
>> database management program (like the one I'm working on). So, it's
>> important for me to be able to do parse these huge files.
>>
>> I'm wondering if it's possible for the Parser to notify the Stream
>> that a game has been parsed. At that point, the Stream implementation
>> can flip the buffer. Does that sound like something that might work?
>> Is that possible?
>>
>>
>>
>> On Sun, Oct 4, 2009 at 4:07 AM, Mark Boylan <boylan.mark at gmail.com> wrote:
>>> Hi.
>>>
>>> My grammar is working really well with smaller test files, but I run
>>> out of heap space on large files. Unfortunately, my users will expect
>>> to be able to load pretty big files occasionally (~1GB).
>>>
>>> Looking at the code documentation for the Antlr3 stream classes, it
>>> looks like they copy the entire stream. I'm  thinking that I need to
>>> write a custom implementation of IntStream or CharStream that buffers
>>> the input. Is that the right way to solve this? Can someone point me
>>> in the right direction?
>>>
>>> Thanks!
>>>
>>>  - mark
>>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>