Re: read vs. mmap (or io vs. page faults)

From: Mikhail Teterin <mi+kde_at_aldan.algebra.com> Date: Wed, 23 Jun 2004 02:41:17 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:58 UTC

On Tuesday 22 June 2004 11:27 pm, Peter Wemm wrote:

= mmap is more valuable as a programmer convenience these days. Don't
= make the mistake of assuming its faster, especially since the cost of
= a copy has gone way down.

Actually, let me back off from agreeing with you here :-) On io-bound
machines (such as my laptop), there is no discernable difference in
either the CPU or the elapsed time -- md5-ing a file with mmap or read
is (curiously) slightly faster than just cat-ing it into /dev/null.

On an dual P2 450MHz, the single process always wins the CPU time and
sometimes the elapsed time. Sometimes it wins handsomly:

	mmap: 35.271u 4.004s 1:06.08 59.4%   10+190k 0+0io 4185pf+0w
	read: 32.134u 15.797s 1:58.72 40.3%  408+302k 11228+0io 12pf+0w

or

	mmap: 35.039u 4.558s 1:10.27 56.3%    10+190k 5+0io 5028pf+0w
	read: 29.931u 27.848s 2:07.17 45.4%   10+187k 11219+0io 5pf+0w

Mind you, both of the two processors are Xeons with _2Mb of cache on
each_, so memory copying should be even cheaper on them than usual. And
yet mmap manages to win...

On a single P2 400MHz (standard 521Kb cache) mmap always wins the CPU
time, and, thanks to that, can win the elapsed time on a busy system.
For example, running two of these processes in parallel (on two separate
copies of the same huge file residing on distinct disks) yields (same
1462726660-byte file as in the dual Xeon stats above):

	mmap: 66.989u 7.584s 3:01.76 41.0%    5+238k 90+0io 22456pf+0w
	      65.474u 7.729s 2:38.59 46.1%    5+241k 90+0io 22401pf+0w
	read: 60.724u 42.394s 3:37.01 47.5%   5+241k 22541+0io 0pf+0w
	      61.778u 41.987s 3:35.36 48.1%   5+239k 11256+0io 0pf+0w

That's 182 vs. 215 seconds, or 15% elapsed time win for mmap. Evidently,
mmap runs through that "nasty nasty code" faster than read runs through
its. mmap loses on an idle system, I presume, because page-faulting is
not smart enough to page-fault ahead as efficiently as read pre-reads
ahead.

Why am I complaining then? Because I want the "nasty nasty code"
improved so that using mmap is beneficial for the single process too.

Thank you very much! Yours,

	-mi