On Monday 12 December 2011 14:47:57 O. Hartmann wrote: > > Not fully right, boinc defaults to run on idprio 31 so this isn't an > > issue. And yes, there are cases where SCHED_ULE shows much better > > performance then SCHED_4BSD. [...] > > Do we have any proof at hand for such cases where SCHED_ULE performs > much better than SCHED_4BSD? Whenever the subject comes up, it is > mentioned, that SCHED_ULE has better performance on boxes with a ncpu > > 2. But in the end I see here contradictionary statements. People > complain about poor performance (especially in scientific environments), > and other give contra not being the case. > > Within our department, we developed a highly scalable code for planetary > science purposes on imagery. It utilizes present GPUs via OpenCL if > present. Otherwise it grabs as many cores as it can. > By the end of this year I'll get a new desktop box based on Intels new > Sandy Bridge-E architecture with plenty of memory. If the colleague who > developed the code is willing performing some benchmarks on the same > hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most > recent Suse. For FreeBSD I intent also to look for performance with both > different schedulers available. > > O. In my spare time I do some stuff which can be considered "HPC". If I recall correctly the most loud supporters of the notion that SCHED_BSD is faster than SCHED_ULE are using more threads than there are cores, causing CPU core contention and more importantly unevenly distributed runtimes among threads, resulting in suboptimal execution times for their programs. Since I've never actually seen that code in question it's hard to say whether or not this "unfair" distribution actually results in lower throughput or that it simply violates an assumption in the code that each thread takes about as long to finish its task. Although I haven't actually benchmarked the two schedulers directly, I have no reason to suspect SCHED_ULE of suboptimal performance because: 1) A program model where there are N threads on N cores which take work items from a shared queue until it is empty has almost perfect scaling on SCHED_ULE (I get 398% CPU usage on a quadcore) 2) The same program on Linux (dual boot) compiled with exactly the same compiler and flags runs slightly slower. I think this has to do with VM differences. What I'm trying to say is that until someone actually shows some code which has demonstrably lower performance on SCHED_ULE and this is not caused by IMHO improper timing dependencies between threads I'd say that there is no cause for concern here. I actually expect performance differences between the two schedulers to show in problems which cause a lot more contention on the CPU cores and use lots of locks internally so threads are frequently waiting on each other, for instance the MySQL benchmarks done a couple of years ago by Kris Kennaway. Aside from algorithmic limitations (SCHED_BSD doesn't really scale all that well), there will always exist some problems in which SCHED_BSD is faster because it by chance has a better execution order for these problems... The good thing is people have a choice :-). I'm looking forward to the results of your benchmark. -- Pieter de GoejeReceived on Mon Dec 12 2011 - 15:31:01 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:21 UTC