[antlr-interest] Java memory mapped IO is slow for big files :(

Terence Parr parrt at cs.usfca.edu
Wed Nov 17 11:37:05 PST 2004


Howdy,

In typical fashion, your expectations are not always met with java 
libraries.  I'm using jdk 1.4.2 on my os x box.  I expected that memory 
mapping a big file would be very fast, but it appears that reading it a 
chunk of a time is MUCH  faster (even using ANTLR 2):

Reading a 44M file 1 time:

2m15s memory mapped IO
1m05s ANTLR 2 small buffer
2m12s ANTLR 3 with char[size-of-file]

So reading into a small buffer (BufferedReader) wins easily over making 
a huge buffer.

Now reading a small 44 line (1173 byte) file 500 times:

0.69s memory mapped IO
2.35s ANTLR 2 small buffer
0.76s ANTLR 3 with char[size-of-file]

It seems that mmap io has a small advantage over char[] for small 
files.  The advantage over ANTLR 2 is due to the big overhead in ANTLR 
2 not the reading into a small buffer issue (the small buffer still 
holds the whole file probably).

BTW, both mmap and char[] take lots of memory to run.  2 bytes per char 
and you're up to 88M to hold the buffer ;)

Hmm... 1.4 NIO doesn't seem worth the effort.  char[] is fast enough 
until you get a big file and then reading a buffer at a time is faster. 
  Reading a small buffer at a time is fast IO-wise, but the CharStream 
interface must still be able to yield chars at random locations for 
getting token text.  People might have to subclass Token to store the 
actual string if they are parsing 50M files.

Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list