On Thu, Jul 07, 2011 at 10:42:39PM +0300, Andriy Gapon wrote: > on 07/07/2011 18:14 Steve Kargl said the following: >> >> I'm using OpenMPI. These are N > Ncpu processes not threads, > > I used 'thread' in a sense of a kernel thread. It shouldn't > actually matter if it's a process or a thread in userland > in this context. > > > and without > > the loss of generality let N = Ncpu + 1. It is a classic master-slave > > situation where 1 process initializes all others. The n-1 slave processes > > are then independent of each other. After 20 minutes or so of number > > crunching, each slave sends a few 10s of KB of data to the master. The > > master collects all the data, writes it to disk, and then sends the > > slaves the next set of computations to do. The computations are nearly > > identical, so each slave finishes it task in the same amount of time. The > > problem appears to be that 2 slaves are bound to the same cpu and the > > remaining N - 3 slaves are bound to a specific cpu. The N - 3 slaves > > finish their task, send data to the master, and then spin (chewing up > > nearly 100% cpu) waiting for the 2 ping-ponging slaves to finishes. > > This causes a stall in the computation. When a complete computation > > takes days to complete, theses stall become problematic. So, yes, I > > want the processes to get a more uniform access to cpus via migration > > to other cpus. This is what 4BSD appears to do. > > I would imagine that periodic rebalancing would take care of this, > but probably the ULE rebalancing algorithm is not perfect. :-) > There was a suggestion on performance_at_ to try to use a lower value for > kern.sched.steal_thresh, a value of 1 was recommended: > http://article.gmane.org/gmane.os.freebsd.performance/3459 node16:kargl[215] uname -a FreeBSD node16.cimu.org 9.0-CURRENT FreeBSD 9.0-CURRENT #2 r223824M: Thu Jul 7 11:12:15 PDT 2011 node16:kargl[216] sysctl -a | grep smp.cpu kern.smp.cpus: 4 4BSD kernel gives for N = Ncpu. 33 processes: 5 running, 28 sleeping PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND 1387 kargl 1 67 0 370M 293M CPU1 1 1:31 98.34% sasmp 1384 kargl 1 67 0 370M 293M CPU2 2 1:31 98.34% sasmp 1386 kargl 1 67 0 370M 294M CPU3 3 1:30 98.34% sasmp 1385 kargl 1 67 0 370M 294M RUN 0 1:31 98.29% sasmp 4BSD kernel gives for N = Ncpu + 1. 34 processes: 6 running, 28 sleeping PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND 1417 kargl 1 71 0 370M 294M RUN 0 1:30 79.39% sasmp 1416 kargl 1 71 0 370M 294M RUN 0 1:30 79.20% sasmp 1418 kargl 1 71 0 370M 294M CPU2 0 1:29 78.81% sasmp 1420 kargl 1 71 0 370M 294M CPU1 2 1:30 78.27% sasmp 1419 kargl 1 70 0 370M 294M CPU3 0 1:30 77.59% sasmp Recompiling the kernel to use ULE instead of 4BSD with the exact same hardware and kernel configuration. ULE kernel gives for N = Ncpu. 33 processes: 5 running, 28 sleeping PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND 1294 kargl 1 103 0 370M 294M CPU3 3 1:30 100.00% sasmp 1292 kargl 1 103 0 370M 294M RUN 2 1:30 100.00% sasmp 1295 kargl 1 103 0 370M 293M CPU0 0 1:30 100.00% sasmp 1293 kargl 1 103 0 370M 294M CPU1 1 1:28 100.00% sasmp ULE kernel gives for N = Ncpu + 1. 34 processes: 6 running, 28 sleeping PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND 1318 kargl 1 103 0 370M 294M CPU0 0 1:31 100.00% sasmp 1319 kargl 1 103 0 370M 294M RUN 1 1:29 100.00% sasmp 1322 kargl 1 99 0 370M 294M CPU2 2 1:03 87.26% sasmp 1320 kargl 1 91 0 370M 294M RUN 3 1:07 60.79% sasmp 1321 kargl 1 89 0 370M 294M CPU3 3 1:06 55.18% sasmp node16:root[165] sysctl -w kern.sched.steal_thresh=1 kern.sched.steal_thresh: 2 -> 1 34 processes: 6 running, 28 sleeping PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 1396 kargl 1 103 0 366M 291M CPU3 3 1:30 100.00% sasmp 1397 kargl 1 103 0 366M 291M CPU2 2 1:30 99.17% sasmp 1400 kargl 1 97 0 366M 291M CPU0 0 1:05 83.25% sasmp 1399 kargl 1 94 0 366M 291M RUN 1 1:04 73.97% sasmp 1398 kargl 1 98 0 366M 291M RUN 0 1:01 54.05% sasmp -- SteveReceived on Thu Jul 07 2011 - 18:08:46 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:15 UTC