[antlr-interest] Re: Local lookahead depth
lgcraymer
lgc at mail1.jpl.nasa.gov
Mon Nov 10 13:55:22 PST 2003
Oliver--
Do the math. A modern disk drive will sustain a bandwidth of 10-15 MB/s. You have described a problem in which disk usage would
be predominantly sequential (one read and one write per pass), and a large fraction of the disk I/O can be overlapped with
computation. It is decidedly not a "one hour or ten" issue--the performance differential is unlikely to be more than a few percent,
provided that you treat the disk as a block-access device. Besides, between stuff discarded during lexing and stuff discarded during
parsing, that 100 MB is likely to shrink into 1-2 MB of tokens and 10-20 MB of syntax tree.
--Loring
--- In antlr-interest at yahoogroups.com, Oliver Zeigermann <oliver at z...> wrote:
> lgcraymer wrote:
> > --- In antlr-interest at yahoogroups.com, "Oliver Zeigermann"
> > <oliver at z...> wrote:
> >
> >
> >>>>because of the memory issue. As a very practical exmaple I
> >
> > have
> >
> >>>parsing
> >>>
> >>>>of the AMM (Aircraft Maintenance Manual) which is available in
> >>>
> >>>SGML
> >>>
> >>>>(very hard to parse, really). I parsed this a few years using
> >>>
> >>>ANTLR, but
> >>>
> >>>>its size normally is around 100MB. A few years ago my machine
> >
> > had
> >
> >>>128MB
> >>>
> >>>>of RAM! You see what I mean?
> >>>
> >>>And how much disk space did you have? On a UNIX box, mmap() is
> >
> > a
> >
> >>>good way of automating file I/O, but even on systems without
> >
> > virtual
> >
> >>>memory, you can fake it. Performance is not an issue--with a
> >
> > problem
> >
> >>>of this size, nothing stays in the processor cache, and the
> >
> > overhead
> >
> >>>of the disk writes will be only a few percent.
> >>>
> >>>--Loring
> >>
> >>
> >>Loring,
> >>
> >>are you really serious about this? Have a look at the DOM vs. SAX
> >>discussion in the XML area...
> >
> >
> > Of course. Large memory machines are a recent luxury, and it is not
> > hard to use disks efficiently.
>
> You know, when you have large amounts of data parsed, it *does* make a
> difference if it takes one or ten hours per run.
>
> Oliver
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list