On Wed, 2 Nov 2005, Robert Watson wrote: > On Wed, 2 Nov 2005, Gavin Atkinson wrote: >> On Wed, 2005-11-02 at 16:23 +0100, Bernd Walter wrote: >>> On Wed, Nov 02, 2005 at 02:58:36PM +0000, Gavin Atkinson wrote: >>>> I'm seeing incredibly poor performance when serving files from an SMP >>>> FreeBSD 6.0RC1 server to a Solaris 10 client. I've done some >>>> experimenting and have discovered that either removing SMP from the >>>> kernel, or setting debug.mpsafenet=0 in loader.conf massively improves >>>> the speed. Switching preemption off seems to also help. >>> Which scheduler? >> >> BSD. As I say, I'm running 6.0-RC1 with the standard GENERIC kernel, apart >> from the options I have listed as being changed above. Polling is >> therefore also not enabled. > > This does sound like a scheduling problem. I realize it's time-consuming, > but would it be possible to have you run each of the above test cases twice > more (or maybe even once) to confirm that in each case, the result is > reproduceable? I've recently been looking at a scheduling problem relating > to PREEMPTION and the netisr for loopback traffic, and is basically a result > of poorly timed context switching ending up being a worst cast scenario. I > suspect something similar is likely here. Have you tried varying the number > of nfsd worker threads on the server to see how that changes matters? No problem. Sorry it's taken so long to get back to you, it's been a hectic week :( Anyway, the trend is consistantly reproducable, although the results themselves can vary between runs in the SMP/mpsafenet cases by as much as 20%. Here are the averages of three reruns, which I've also done for ULE: 4BSD ULE No SMP, mpsafenet=1 78.7 62.7 No SMP, mpsafenet=0 71.1 76.0 No SMP, mpsafenet=1, no PREEMPTION 54.7 55.5 No SMP, mpsafenet=0, no PREEMPTION 73.6 77.6 SMP, mpsafenet=1 346.5 309.5 SMP, mpsafenet=0 56.9 88.4 SMP, mpsafenet=1, no PREEMPTION 320.2 136.6 SMP, mpsafenet=0, no PREEMPTION 57.0 77.9 The above are results for 4 nfsd servers (nfsd -n 4). It turns out that you were correct in thinking that the number of nfsd processes would make a difference, here are some timings for the GENERIC+SMP kernel (eg with PREEMPTION/4BSD, the slowest one above), with varying numbers of processes: 1 2 4 8 12 16 52.8 59.2 319.3 356.1 377.3 388.1 As before, all tests were done with freshly rebooted server and with a single "dry run" transfer to warm the vm cache up. The file transferred each time is 512meg worth of /dev/random output. I'm actually quite surprised about how much difference reducing the number of threads made. Does all of this information help track down the cause of the problem? I'm happy to time more transfers with different configs if you want to explore other avenues. Thanks, GavinReceived on Thu Nov 10 2005 - 10:45:56 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:47 UTC