[antlr-interest] Bounding the token stream in the C backend

Wed Feb 24 21:09:09 PST 2010

I've got a large input file (~39MB) that I'm attempting to parse with
an ANTLR3-generated C parser. The parser is using a huge amount of
memory (~3.7GB) and seems to start thrashing without making much
progress towards termination. I found a thread from earlier this month
(http://markmail.org/message/jfngdd2ci6h7qrbo) suggesting the most
likely cause of such behavior is a parser bug, but I've stepped
through the code and it seems to be lexing just fine. Rather, it seems
the problem is that fillBuffer() is tokenizing the whole file in one
go; then, the parsing rules slow to a crawl because the token buffer
is sitting on all the memory.

I wonder if there is a way to change fillBuffer()'s behavior, so that
it will only lex some bounded number of tokens before allowing parsing
to proceed?

Thanks,
Chris