[antlr-interest] Bounding the token stream in the C backend

Thu Feb 25 07:30:42 PST 2010

Hi Christopher,

I am not entirely sure, but you may have run into the same problem as I did
a
while ago. You may want to have a look at the discussion thread back then
for
some advices:
http://www.antlr.org/pipermail/antlr-interest/2009-April/034125.html
Although I used the simple solution Jim suggested, i.e. parsed the
headers and just used some custom code to parse the rest of the file,
some of the advices in that thread might be helpful.

Hope this helps,

Nikos

On Thu, Feb 25, 2010 at 6:09 AM, Christopher L Conway <cconway at cs.nyu.edu>wrote:

> I've got a large input file (~39MB) that I'm attempting to parse with
> an ANTLR3-generated C parser. The parser is using a huge amount of
> memory (~3.7GB) and seems to start thrashing without making much
> progress towards termination. I found a thread from earlier this month
> (http://markmail.org/message/jfngdd2ci6h7qrbo) suggesting the most
> likely cause of such behavior is a parser bug, but I've stepped
> through the code and it seems to be lexing just fine. Rather, it seems
> the problem is that fillBuffer() is tokenizing the whole file in one
> go; then, the parsing rules slow to a crawl because the token buffer
> is sitting on all the memory.
>
> I wonder if there is a way to change fillBuffer()'s behavior, so that
> it will only lex some bounded number of tokens before allowing parsing
> to proceed?
>
> Thanks,
> Chris
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>