[antlr-interest] ANTLR Peak Memory Issue

Wed Jan 18 22:27:35 PST 2012

We are writing a parser using ANTLR, however find unusually high peak memory. Some key points:
*       We use ANTLR C target. Our application is C++.
*       There are no references to $text etc. We directly use token start and end pointers to form string ourselves, where required.
*       Grammar is LL(2).
*       We observe that the token structure itself is bulky 264 bytes or something on 64 bit platform.
*       Further, looks like that ANTLR is tokenizing entire source upfront. This with large token size leads to almost peak memory even before real parsing begins.
*       I went thru some of the earlier posts on the forum and see idea of partitioning the source file lexically and process in parts. Problem is that even sections of the source, which can be lexically identified, are large enough to give us memory problem.

Can you suggest options we can explore to reduce peak the memory? Is there a token stream implementation which keeps only some constant number of tokens in memory as oppose to all?

I also understand that active development work is going on with ANTLR 4.x:
*       Can you share some information on its likely availability of C/C++ target? Approximate time frame is good.
*       If token size going to be smaller in 4.x?
*       Will it still demand that all text be tokenized upfront?

Thank you,
-Manu.