On Mon, Jul 11, 2011 at 06:07:04PM +0300, Andriy Gapon wrote: > on 11/07/2011 17:41 Ivan Voras said the following: > > On 07/07/2011 22:08, Steve Kargl wrote: > > > >> 4BSD kernel gives for N = Ncpu + 1. > >> > >> 34 processes: 6 running, 28 sleeping > >> > >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND > >> 1417 kargl 1 71 0 370M 294M RUN 0 1:30 79.39% sasmp > >> 1416 kargl 1 71 0 370M 294M RUN 0 1:30 79.20% sasmp > >> 1418 kargl 1 71 0 370M 294M CPU2 0 1:29 78.81% sasmp > >> 1420 kargl 1 71 0 370M 294M CPU1 2 1:30 78.27% sasmp > >> 1419 kargl 1 70 0 370M 294M CPU3 0 1:30 77.59% sasmp > > > >> ULE kernel gives for N = Ncpu + 1. > >> > >> 34 processes: 6 running, 28 sleeping > >> > >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND > >> 1318 kargl 1 103 0 370M 294M CPU0 0 1:31 100.00% sasmp > >> 1319 kargl 1 103 0 370M 294M RUN 1 1:29 100.00% sasmp > >> 1322 kargl 1 99 0 370M 294M CPU2 2 1:03 87.26% sasmp > >> 1320 kargl 1 91 0 370M 294M RUN 3 1:07 60.79% sasmp > >> 1321 kargl 1 89 0 370M 294M CPU3 3 1:06 55.18% sasmp > > > > I can confirm this. Look at the priorities column for the two cases. For some > > reason (CPU affinity?) the loads get asymmetrical on ULE. > > Yeah, but what problem is demonstrated here? That ULE cannot balance numerically intensive work, leading to poor performance. > Are we confident that non-even workload is inherently bad? > E.g.: > 79.39 + .. + 77.59 < 5 * 80 = 400 > 100.00 + ... + 55.18 ~~ 402 which is more than theoretically possible :-) > So it would _appear_ that with ULE we get more work out of available CPUs. > > But it's not clear which of the processes are slaves and which is master. > It's also not clear why the master takes so much CPU (on par with the > slaves) - > from my reading of its description (by Steve) it should be doing only light > periodic work. These are all slave processes. The master process was on a different node in the cluster. Each process is doing the exact same computation with only a small change in a coordinate from (x,y,z) to (x,y+n*dy,z) with n = 1, 2, 3, 4. The small change does not causes a different code path, so all should complete in nearly identical times. > If it does have to do CPU-heavy work, then I'd imagine that it should > spawn only Ncpus - 1 slaves. And if you have M users on the system? Also note, you can get the exact same loading problem by launching Ncpu+1 completely independent cpu-bound processes. Ncpu-1 processes will be bound to specific cpus and 2 processes will ping-pong on one cpu. This ping-ponging will simply kill performance. > Also, if with ULE we get less jumping around between CPUs than with > 4BSD, that would mean less cache misses and more useful work done. Well, yes, less cache misses for the pinned processes; and, no, for more useful work done. > Still not convinced that there is a problem with ULE here. It's ULE. See the last 3 years of my posts on the topic. > I'd start with the app. I'd switch to 4BSD ;-). -- SteveReceived on Mon Jul 11 2011 - 14:16:54 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:15 UTC