Here are my benchmark numbers for parallel tarball extraction with/without mpsafevfs on a 12-processor E4500 running up-to-date 6.0. Kernel was built without INVARIANTS and other debugging options, without ADAPTIVE_GIANT (which causes about a 200% performance penalty on system time in my testing, and has marginal impact on real or user time) and with 4BSD scheduler (ULE causes spontaneous reboots on this machine). The e4500 uses the esp SCSI controller, which runs without Giant. The test is this: #!/bin/sh for i in 1 2 3 4 5 6 7 8 9 10 11 12; do mkdir $i tar xfC /var/portbuild/sparc64/5/tarballs/bindist.tar $i & done on a 2000mb preallocated malloc backed md disk (machine has 5GB RAM). Before each test I umount, newfs with default options (i.e. no -U; this kills performance on md by a factor of several times) and mount. The tarball is # ls -l /var/portbuild/sparc64/5/tarballs/bindist.tar -rw-r--r-- 1 kris kris 133231104 Apr 28 12:18 /var/portbuild/sparc64/5/tarballs/bindist.tar # tar tvf /var/portbuild/sparc64/5/tarballs/bindist.tar | wc -l 5664 (it's a copy of a sparc64 5.4-STABLE world I use to populate package build chroots). A single extraction (with tarball cached) with mpsafevfs=1 takes: 14.85 real 1.31 user 10.43 sys 14.90 real 1.31 user 10.40 sys 15.03 real 1.26 user 10.55 sys 14.49 real 1.35 user 10.47 sys 14.50 real 1.36 user 10.42 sys 14.50 real 1.28 user 10.52 sys 14.52 real 1.33 user 10.48 sys 14.44 real 1.38 user 10.36 sys 14.54 real 1.37 user 10.39 sys 14.63 real 1.29 user 10.56 sys mean=14.64 seconds real time without mpsafevfs: 14.72 real 1.39 user 10.45 sys 14.70 real 1.40 user 10.47 sys 14.99 real 1.41 user 10.54 sys 15.13 real 1.48 user 10.45 sys 15.18 real 1.40 user 10.50 sys 14.87 real 1.64 user 10.38 sys 14.66 real 1.42 user 10.37 sys 14.69 real 1.49 user 10.30 sys 14.87 real 1.45 user 10.60 sys 14.75 real 1.47 user 10.43 sys mean=14.86 real x mpsafevfs + !mpsafevfs +--------------------------------------------------------------------------+ | + x | | + ++ + + + x xx x x + x + x + x x| ||__________M________A__|_________________|_M________________| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 14.66 15.18 14.87 14.856 0.18810163 + 10 14.44 15.03 14.54 14.64 0.2081666 Difference at 95.0% confidence -0.216 +/- 0.186404 -1.45396% +/- 1.25474% (Student's t, pooled s = 0.198388) So mpsafevfs has a slight measurable benefit even for non-concurrent extraction. The parallel extraction without mpsafevfs: 319.42 real 35.70 user 1547.38 sys 317.80 real 35.41 user 1532.87 sys 318.49 real 35.35 user 1542.23 sys 321.82 real 35.51 user 1559.50 sys 317.66 real 35.51 user 1566.16 sys 318.63 real 35.64 user 1552.48 sys 319.51 real 35.69 user 1548.99 sys 317.79 real 35.34 user 1542.89 sys 319.89 real 35.70 user 1536.34 sys 318.76 real 35.24 user 1545.21 sys with mpsafevfs: 80.24 real 27.70 user 475.54 sys 83.13 real 27.94 user 491.55 sys 87.66 real 28.45 user 500.68 sys 81.88 real 28.12 user 463.51 sys 83.23 real 27.87 user 483.62 sys 82.20 real 28.07 user 482.57 sys 83.82 real 28.29 user 473.70 sys 84.54 real 27.95 user 472.12 sys 80.29 real 28.24 user 461.87 sys 87.77 real 28.34 user 482.03 sys 82.10 real 27.79 user 475.31 sys system clock: +--------------------------------------------------------------------------+ | x ++ | | x ++ | |xx +++| |xxxx +++| ||A| |A | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 461.87 500.68 482.03 478.719 11.975802 + 10 1532.87 1566.16 1547.38 1547.405 10.066401 Difference at 95.0% confidence 1068.69 +/- 10.3942 223.239% +/- 2.17124% (Student's t, pooled s = 11.0624) wall clock: +--------------------------------------------------------------------------+ | + | | + | | + | | x + | | x + | | x + | |xx + | |xxx + | |xxx ++| ||A| A|| +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 11 80.24 87.77 83.13 83.350909 2.5277241 + 10 317.66 321.82 318.76 318.977 1.2618333 Difference at 95.0% confidence 235.626 +/- 1.85556 282.692% +/- 2.2262% (Student's t, pooled s = 2.02905) i.e. mpsafevfs shows enormous improvements in both cases. Comparing to the mean time for a single extraction, 12 simultaneous extractions with mpsafevfs take the time of 5.69 single, and 21.788 without mpsafevfs. This is an effective concurrency of 2.11 (12/5.69) extractions for mpsafevfs and 0.55 without (i.e. nearly twice as bad as just sequentializing the extractions). I might be bumping into the bandwidth of md here - when I ran less rigorous tests with lower concurrency of extractions I seemed to be getting marginally better performance (about an effective concurrency of 2.2 for both 3 and 10 simultaneous extractions - so at least it doesn't seem to degrade badly). Or this might be reflecting VFS lock contention (which there is certainly a lot of, according to mutex profiling traces). Certainly for package builds on this machine I get much better performance and lower CPU utilization if I do every package build in a separate (swap-backed) md than with them all in a single large md, which tells me it's not hard to saturate a single md. Even if I am hitting another limit here that is placing an upper bound on the performance, filesystem performance with mpsafevfs is clearly much better than without, and we are now seeing clear benefits from SMP on 6.0 compared to earlier versions of FreeBSD. Kris P.S. Big props to Jeff Roberson for making this work! Thanks also to Hiroki Sato for donating the E4500 and other machine resources.
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:34 UTC