[antlr-interest] parsing huge files

Wed Oct 3 09:05:12 PDT 2007

Hi everybody!

I'm trying to use ANTLR for a task that I've done till now with perl regexp:
parse huge log files.
My primary goal is to program parsers in a more declarative fashion, of
course, instead of writing by hand tons of regexp, while<> cycles and
if-elsif statements for every different log type.

I think my problem is ANTLR's CommonTokenStream, because in method
fillBuffer() tries to buffer *all* tokens from the Lexer while my needs are
to parse one log record at a time, discard old tokens, read a bunch of new
tokens (log files are repetitive and made up of independent records) and go
on like this.
Instead, when the magnitude of input files is Gigabytes or more, such stream
fills up memory even before the Parser starts doing its work!

I'm not a parsing neither ANTLR expert, so I'm asking some advice about
changing this behaviour (or at least someone who says: "don't use antlr for
this!"). Is it possible to do the trick at grammar level, or must I subclass
CommonTokenStream or ANTLRInputStream, or... ?

thanks a lot!

MC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20071003/e190a314/attachment.html