Re: SCHED_ULE should not be the default

From: Steve Kargl <sgk_at_troutmask.apl.washington.edu> Date: Mon, 12 Dec 2011 11:26:37 -0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:21 UTC

On Mon, Dec 12, 2011 at 01:03:30PM -0600, Scott Lambert wrote:
> On Mon, Dec 12, 2011 at 09:06:04AM -0800, Steve Kargl wrote:
> > Tuning kern.sched.preempt_thresh did not seem to help for
> > my workload.  My code is a classic master-slave OpenMPI
> > application where the master runs on one node and all
> > cpu-bound slaves are sent to a second node.  If I send
> > send ncpu+1 jobs to the 2nd node with ncpu's, then 
> > ncpu-1 jobs are assigned to the 1st ncpu-1 cpus.  The
> > last two jobs are assigned to the ncpu'th cpu, and 
> > these ping-pong on the this cpu.  AFAICT, it is a cpu
> > affinity issue, where ULE is trying to keep each job
> > associated with its initially assigned cpu.
> > 
> > While one might suggest that starting ncpu+1 jobs
> > is not prudent, my example is just that.  It is an
> > example showing that ULE has performance issues. 
> > So, I now can start only ncpu jobs on each node
> > in the cluster and send emails to all other users
> > to not use those node, or use 4BSD and not worry
> > about loading issues.
> 
> Does it meet your expectations if you start (j modulo ncpu) = 0
> jobs on a node?
> 

I've never tried to launch more than ncpu + 1 (or + 2)
jobs.  I suppose at the time I was investigating the issue,
it was determined that 4BSD allowed me to get my work done
in a more timely manner.  So, I took the path of least
resistance.

-- 
Steve