[antlr-interest] Re: Local lookahead depth

lgcraymer lgc at mail1.jpl.nasa.gov
Mon Nov 10 13:55:22 PST 2003


Oliver--

Do the math.  A modern disk drive will sustain a bandwidth of 10-15 MB/s.  You have described a problem in which disk usage would 
be predominantly sequential (one read and one write per pass), and a large fraction of the disk I/O can be overlapped with 
computation.  It is decidedly not a "one hour or ten" issue--the performance differential is unlikely to be more than a few percent, 
provided that you treat the disk as a block-access device.  Besides, between stuff discarded during lexing and stuff discarded during 
parsing, that 100 MB is likely to shrink into 1-2 MB of tokens and 10-20 MB of syntax tree.

--Loring


--- In antlr-interest at yahoogroups.com, Oliver Zeigermann <oliver at z...> wrote:
> lgcraymer wrote:
> > --- In antlr-interest at yahoogroups.com, "Oliver Zeigermann" 
> > <oliver at z...> wrote:
> > 
> > 
> >>>>because of the memory issue. As a very practical exmaple I 
> > 
> > have 
> > 
> >>>parsing 
> >>>
> >>>>of the AMM (Aircraft Maintenance Manual) which is available in 
> >>>
> >>>SGML 
> >>>
> >>>>(very hard to parse, really). I parsed this a few years using 
> >>>
> >>>ANTLR, but 
> >>>
> >>>>its size normally is around 100MB. A few years ago my machine 
> > 
> > had 
> > 
> >>>128MB 
> >>>
> >>>>of RAM! You see what I mean?
> >>>
> >>>And how much disk space did you have?  On a UNIX box, mmap() is 
> > 
> > a 
> > 
> >>>good way of automating file I/O, but even on systems without 
> > 
> > virtual 
> > 
> >>>memory, you can fake it. Performance is not an issue--with a 
> > 
> > problem 
> > 
> >>>of this size, nothing stays in the processor cache, and the 
> > 
> > overhead 
> > 
> >>>of the disk writes will be only a few percent.
> >>>
> >>>--Loring
> >>
> >>
> >>Loring,
> >>
> >>are you really serious about this? Have a look at the DOM vs. SAX
> >>discussion in the XML area...
> > 
> > 
> > Of course.  Large memory machines are a recent luxury, and it is not 
> > hard to use disks efficiently.
> 
> You know, when you have large amounts of data parsed, it *does* make a 
> difference if it takes one or ten hours per run.
> 
> Oliver


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list