On 12/12/11 18:06, Steve Kargl wrote: > On Mon, Dec 12, 2011 at 04:18:35PM +0000, Bruce Cran wrote: >> On 12/12/2011 15:51, Steve Kargl wrote: >>> This comes up every 9 months or so, and must be approaching FAQ >>> status. In a HPC environment, I recommend 4BSD. Depending on the >>> workload, ULE can cause a severe increase in turn around time when >>> doing already long computations. If you have an MPI application, >>> simply launching greater than ncpu+1 jobs can show the problem. PS: >>> search the list archives for "kargl and ULE". >> >> This isn't something that can be fixed by tuning ULE? For example for >> desktop applications kern.sched.preempt_thresh should be set to 224 from >> its default. I'm wondering if the installer should ask people what the >> typical use will be, and tune the scheduler appropriately. >> Is the tuning of kern.sched.preempt_thresh and a proper method of estimating its correct value for the intended to use workload documented in the manpages, maybe tuning()? I find it hard to crawl a lot of pros and cons of mailing lists for evaluating a correct value of this, seemingly, important tunable. > > Tuning kern.sched.preempt_thresh did not seem to help for > my workload. My code is a classic master-slave OpenMPI > application where the master runs on one node and all > cpu-bound slaves are sent to a second node. If I send > send ncpu+1 jobs to the 2nd node with ncpu's, then > ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The > last two jobs are assigned to the ncpu'th cpu, and > these ping-pong on the this cpu. AFAICT, it is a cpu > affinity issue, where ULE is trying to keep each job > associated with its initially assigned cpu. > > While one might suggest that starting ncpu+1 jobs > is not prudent, my example is just that. It is an > example showing that ULE has performance issues. > So, I now can start only ncpu jobs on each node > in the cluster and send emails to all other users > to not use those node, or use 4BSD and not worry > about loading issues. >
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:21 UTC