Re: ULE and current.

From: Bruce Evans <bde_at_zeta.org.au> Date: Thu, 11 Dec 2003 23:28:00 +1100 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:33 UTC

On Thu, 11 Dec 2003, Jeff Roberson wrote:

> On Thu, 11 Dec 2003, Andy Farkas wrote:
>
> > Jeff Roberson wrote:
> >
> > > Andy Farkas wrote:
> ...
> > > > Adding a third nice process eliminates the idle time, but cpu% is still bad:
> > > >
> > > > team2# nice -7 sh -c "while :; do echo -n;done" &
> > > > team2# sleep 120; top -S
> > > >
> > > >   PID USERNAME   PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
> > > >   705 root       133   -7  1576K   952K CPU0   0   1:53 100.78% 100.78% sh
> > > >   675 root       133   -7  1576K   952K RUN    1  12:12 51.56% 51.56% sh
> > > >   676 root       133   -7  1576K   952K RUN    1  11:30 49.22% 49.22% sh
> > > >   729 root        76    0  2148K  1184K CPU1   1   0:00  0.78%  0.78% top
> > > >    12 root       -16    0     0K    12K RUN    0  24:00  0.00%  0.00% idle: cpu0
> > > >    11 root       -16    0     0K    12K RUN    1   7:00  0.00%  0.00% idle: cpu1
> >
> > And at this point I would expect something like:
> >
> >  sh #0 using 66.3%,
> >  sh #1 using 66.3%,
> >  sh #2 using 66.3%,
> >  idle: cpu0 to be 0%,
> >  idle: cpu1 to be 0%.
>
> This is actually very difficult to get exactly right.  Since all processes
> want to run all the time, you have to force alternating pairs to share the
> second cpu.  Otherwise they wont run for an even amount of time.

Perhaps add some randomness.  Busting the caches every second or so
shouldn't make much difference.  It happens anyway if there are more
processes.

> > > I agree that 100.78% is wrong.  Also, the long term balancer should be
> > > kicking one sh process off of the doubly loaded cpu every so often.  I'll
> > > look into this, thanks.
> >
> > Could it be that the scheduler/balancer is confused by different idle
> > processes?  Why does 'systat -p' show 3 idle procs?? :
> >
>
> The vm has an idle thread that zeros pages.  This is the third thread.
>
> >                     /0   /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
> > root     idle: cpu0 XXXXXXXXXXXXXXXX
> > root     idle: cpu1 XXXXXXXXXXXXXXXX
> >              <idle> XXXXXXXXXXXXXXXX

No, <idle> is just cp_time[CP_IDLE] scaled incorrectly.  It is bogus now that
we have actual idle processes.  The scaling for the idle processes seems to
be almost correct (it is apparently scaled by the number of CPUs), but the
scaling or the value for <idle> is apparently off by a factor of the number
of CPUs.  With 2 CPUs and only 2 "nice -7" processes I see the following
inconsistencies:

top:
    sh #0: 75% (all %cpu approximate; often higher, rarely lower)
    sh #1: 75%
    idle: cpu0: 50% (usually significantly higher)
    CPU states: 25% idle (this is derived from cp_time[CP_IDLE] and is
                          essentially the cpu0 idle percentage / 2, except
                          it is a long term average so it has very little
                          jitter)

systat -p:
    sh #0: 26-30% (should be 75% / 2)
    sh #0: 26-30% (should be 75% / 2)
    idle: cpu0: 16-20% (should be 25% with very little jitter)
    <idle>: 5-10% (should be 25%).  It should be the sum of the idle
	    percentages for all "idle: cpuN" threads, (except it shouldn't
	    exist since the thread idle percentages give it in more detail),
	    but it seems to be that divided by the number of CPUs, with
	    additional rounding errors.

> > So, where *I* get confused is that top(1) thinks that the system can be up
> > to 200% idle, whereas systat(1) thinks there are 3 threads each consuming
> > a third of 100% idleness... who is right?
>
> Both, they just display different statistics. ;-)

Neither; they have different bugs :-).  top actually seems to be
bug-free here, except it intentionally displays percentages that add
up to a multiple of 100%.  This seems to be best.  You just have to
get used to the percentages in the CPU stat line being scaled and the
others not being scaled.

I now understand the case of an idle system:

                    /0   /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
root     idle: cpu0 XXXXXXXXXXXXXXXX
root     idle: cpu1 XXXXXXXXXXXXXXXX
             <idle> XXXXXXXXXXXXXXXX

This should show 50% for each "idle: cpuN" process.  Instead, it tries
to show 33.3% for each idle process including the pseudo one, but has
some rounding errors that make it display 30%.  The factor of 3 to get
33.3% instead of 2 to get 50% for the real idle processes is from
bogusly counting the pseudo-idle process.  The factor of 3 to get 33.3%
instead of 1 to get 100% for the pseudo-idle process is from bogusly
counting the real idle processes.

None of these bugs except the percentages being slightly too high are
scheduler-dependent.

Bruce