--- On Wed, 7/16/08, Steve Kargl <sgk_at_troutmask.apl.washington.edu> wrote: > From: Steve Kargl <sgk_at_troutmask.apl.washington.edu> > Subject: Re: ULE scheduling oddity > To: "Barney Cordoba" <barney_cordoba_at_yahoo.com> > Cc: current_at_freebsd.org > Date: Wednesday, July 16, 2008, 5:13 PM > On Wed, Jul 16, 2008 at 07:49:03AM -0700, Barney Cordoba > wrote: > > --- On Tue, 7/15/08, Steve Kargl > <sgk_at_troutmask.apl.washington.edu> wrote: > > > last pid: 3874; load averages: 9.99, 9.76, > 9.43 up 0+19:54:44 10:51:18 > > > 41 processes: 11 running, 30 sleeping > > > CPU: 100% user, 0.0% nice, 0.0% system, 0.0% > interrupt, 0.0% idle > > > Mem: 5706M Active, 8816K Inact, 169M Wired, 84K > Cache, 108M > > > Buf, 25G Free > > > Swap: 4096M Total, 4096M Free > > > > > > PID USERNAME THR PRI NICE SIZE RES > STATE C TIME WCPU COMMAND > > > 3836 kargl 1 118 0 577M 572M CPU7 > 7 6:37 100.00% kzk90 > > > 3839 kargl 1 118 0 577M 572M CPU2 > 2 6:36 100.00% kzk90 > > > 3849 kargl 1 118 0 577M 572M CPU3 > 3 6:33 100.00% kzk90 > > > 3852 kargl 1 118 0 577M 572M CPU0 > 0 6:25 100.00% kzk90 > > > 3864 kargl 1 118 0 577M 572M RUN > 1 6:24 100.00% kzk90 > > > 3858 kargl 1 112 0 577M 572M RUN > 5 4:10 78.47% kzk90 > > > 3855 kargl 1 110 0 577M 572M CPU5 > 5 4:29 67.97% kzk90 > > > 3842 kargl 1 110 0 577M 572M CPU4 > 4 4:24 66.70% kzk90 > > > 3846 kargl 1 107 0 577M 572M RUN > 6 3:22 53.96% kzk90 > > > 3861 kargl 1 107 0 577M 572M CPU6 > 6 3:15 53.37% kzk90 > > > > > > I would have expected to see a more evenly > distributed WCPU > > > of around 80% for each process. > > > > I don't see why "equal" distribution is > or should be a goal, as that > > does not guarantee optimization. > > The above images may be parts of an MPI application. > Synchronization > problems simply kill performance. The PIDs with 100% WCPU > could be > spinning in a loop waiting for PID 3861 to send a message > after > completing a computation. The factor of 2 difference in > TIME for > PID 3836 and 3861 was still observed after more than an > hour of > accumulated time for 3836. It appears as if the algorithm > for > cpu affinity is punishing 3846 and 3861. > > > Given that the cache is shared between only 2 cpus, it > might very well > > be more efficient to run on 2 CPUs when the 3rd or 4th > isn't needed. > > > > It works pretty darn well, IMO. Its not like your > little app is the > > only thing going on in the system > > Actually, 10 copies of the little app are the only things > running except > top(1) and few sleeping system services (e.g., nfsd and > sshd). Apparently, > you missed the "41 processes: 11 running, 30 > sleeping" line above. > > -- > Steve Your apparent argument that somehow every cpu cycle can be sliced equally and automagically is as silly as the expectation that a first generation scheduler will exhibit 100% efficiency across 8 cpus. Its just as likely an inefficiency in the application as in the kernel.Received on Thu Jul 17 2008 - 14:12:46 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:32 UTC