[antlr-interest] Keeping all the data in RAM...

Sat Jun 4 18:08:10 PDT 2005

> > Can you define "common" and "extreme" in this context?

No offense guys, but from my experience, I think Ter, John and Loring's
definition of "common" is quite limited. This is based on many of the
discussions we've had on this list over the past few months as well as
previous experience.

I believe they feel that ANTLR is used mostly for single-run (possibly
multipass) translators, such as compilers.

This is most definitely not the case. While there are some tools like this,
ANTLR is very heavily used in applications that have a much longer lifespan,
and therefore can have a much higher performance impact. (Apps such as
server-side apps parsing expressions/translating things like JSP;
development apps like checkstyle which are run automatically everytime I
save in eclipse, and so forth)

[Note: My usage is for xml parsing and parsing small expressions that appear
several times in a *very* memory sensitive application, esp under embedded
constraints. Basically next-gen TiVo-like stuff. We're trying to trim
everything we can...]

Thinking of "common" in terms of the size of the input being parsed for this
question, I think the "common" input size is reasonably-sized (such as
expression strings [I think JSTL uses ANTLR] and java source files).

For "extreme", I don't think 2G files are the issue. Keep in mind that
people don't run their java apps at 2G (most leave it at 64M and don't even
realize they can change it!). Heck, most folks don't have that much RAM (I'd
wager most computer users are 256M and under, with developers at perhaps
512M or 1G. I *just* got a 2G machine, and that was still a bit pricey.)

Considering that parsing is "simply" data input for an application, and that
the application is probably using a lot of memory itself, this could become
a huge issue.

What's the #1 reason folks don't use DOM to parse XML? It reads the entire
tree into RAM...

Face it, large files are becoming more and more commonplace.

I am happy to hear that this behavior can be avoided. Simple expressions or
java source code, won't be an issue, but I think many people will need to
special-case the behavior.

Bottom line: This needs to be a documented "gotcha" with a well-explained
work-around.

Better: A simple switch to toggle the behavior.

Note that I do like that all of that data is kept, as for compilers and such
that extra info is really handy... But be careful of the assumptions you're
making about the types of applications being written. I'd wager that 80+% of
the apps won't care about preserving whitespace or delimeter chars (like
parens) and thus you're wasting valuable RAM.

Later,
-- Scott