On Sun, Feb 08, 2004, Poul-Henning Kamp wrote: > In message <20040208080630.GA14364_at_VARK.homeunix.com>, David Schultz writes: > >I spent some time today benchmarking the various proposed pid > >allocators. The full results, along with pretty pictures and a > >more complete analysis, are at: > > > > http://people.freebsd.org/~das/pbench/pbench.html [1] > > You _do_ realize that the difference between "tjr" and "net" in the > bottom plot is not statistically significant ? > > Stratification is visibly present from approx 1500 pids and up, and > ends up being responsible for 1/3rd of the difference by the time > you get to 5000 pids. > > (The tell-tale sign here is that the two data sets both fall on two > mostly straight lines in a random looking pattern, with practically > no measurements hitting the interval between the two lines.) > > If we assume the stratification has linearity with number of pids, > which I think looks reasonable, and we read the right hand edge as > half a second and the left hand edge as zero, we find: > > (.5 - 0) [second] > -------------------------------------- = 10 [nsec] / [iteration*pid] > 10000 [iterations] * (5000 - 0) [pids] > > 10nsec per operation is getting you into the territory of effective > TSC-timecounter resolution, RAM access time, cache miss delays > and all sorts of other hardware effects. To avoid jitter and timestamping overhead, I read the time only at the start and end of the entire sequence of 10000 operations. I obtained the sample variance by running the entire test three times, i.e. for (pass = 0; pass < 3; pass++) { for (nprocs = 100; nprocs < 5000; nprocs++) { set up the test, fork the sleepers; take starting timestamp; for (iter = 0; iter < 10000; iter++) run test; take ending timestamp; } } Nevertheless, you're definitely right about the stratification. I'm not sure how to explain that. My best theory is that there's some confounding factor, such as the pageout daemon waking up, that has a constant overhead. If you look at the original samples, there's always two that are normal and one outlier. More samples would probably correct for this, and if I were running the benchmark again, I would have done more iterations of the outer loop in the pseudocode above and fewer of the inner loop. > So all in all, I would say that you have proven that "tjr" and "net" > are better than "old", but not that there is any statistically > significant performance difference between them. Yes, I realize that. I took 10 more samples of 10000 forks each with 5000 sleeping processes in the background and got the following: tjr: 1.130558492 1.125901197 1.144079485 1.118981882 1.131435699 1.123052511 1.133321135 1.121301171 1.133015788 1.124377539 net: 1.116848091 1.119333603 1.117941526 1.121989527 (got an outlier (2.547301682) here, so I reran this test) 1.118023912 1.110658198 1.126021045 1.106436712 1.116406694 1.100889638 This data show a difference at the 95% confidence level, namely, that the NetBSD algorithm is about 1% faster on a system with 5000 processes (and only 0.1% faster if you're looking at the total overhead of fork() rather than vfork().) I think that pretty much rules out performance as the deciding factor between the two.Received on Sun Feb 08 2004 - 00:45:48 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:42 UTC