Re: buildworld times

From: Stefan Eßer <se_at_FreeBSD.org>
Date: Wed, 3 Mar 2004 23:12:04 +0100
On 2004-03-01 22:24 -0500, Robert Watson <rwatson_at_freebsd.org> wrote:
> FYI, I now have access to a build box at work with two Xeon 2.4GHz
> processors, each with two logical CPUs, and 1GB of memory.  Here are the
> buildworld times, with -DNORESCUE and -DNOPROFILE, 5.2.1-RELEASE
> GENERICish kernel (no WITNESS, INVARIANTS):
> 
> 		Real		User		Sys
> default	2195.16		1717.69		467.78
> -j 2		2003.20		2151.49		539.67
> -j 4		1703.15		2485.99		654.00
> -j 6		1645.34		2595.67		718.12
> -j 8		1627.88		2618.15		743.53

On a normal dual-processor (no SMT) I'd expect the real time to be half
the sum of user and system time (in the best case), except for the 'default'
case, where the times should just sum up (I assume that '-pipe' has not been 
used (?), please correct me if I'm wrong ...).

	(usr + sys)   sum    sum/2  real  ratio
===============================================
default 1718 + 468 => 2186 |      | 2195 |  99%
-j 2    2151 + 540 => 2391 | 1195 | 2003 |  60%
-j 4    2486 + 654 => 3140 | 1570 | 1703 |  92%
-j 6    2596 + 718 => 3314 | 1657 | 1645 | 101%
-j 8    2618 + 744 => 3362 | 1681 | 1628 | 103%

In this table, 'ratio' is calculated as ((user+sys / sum) / NCPU) with NCPU
set to 1 for the default case and to 2 else.

The system seems to spend negligible time on disk I/O, and the results look 
consistent, except for the '-j 2' case (why ???).

> I assume this is largely a product of hyperthreads (less CPU for the same
> user time, requiring more user time to get the same amount of work done),
> but it's interesting how the usertime goes up substantially with
> parallelism.  I imagine these numbers would be somewhat better with

The numbers look very interesting, especially if your more recent results
under -CURRENT are taken into consideration, too:

On 2004-03-02 17:18 -0500, Robert Watson <rwatson_at_FreeBSD.org> wrote:
> Interestingly, on the same hardware using 5.2-CURRENT GENERIC - WITNESS,
> INVARIANTS, et al (with ULE since that's the default now):
> 
>                 Real            User            Sys
> default         2304.16         1834.51         474.96         # slower
> -j 2            1611.61         2715.89         684.97         # faster!
> -j 4            1416.11         2988.32         878.40         # faster!
> -j 6            1399.92         3090.95         955.74         # fastest!
> -j 8            1405.38         3151.92         1003.69        # fasterish!

	(usr +  sys)   sum    sum/2  real  ratio
================================================
default 1835 +  475 => 2310 |      | 2304 | 103% 
-j 2    2716 +  685 => 3401 | 1700 | 1612 | 105%
-j 4    2988 +  878 => 3866 | 1933 | 1416 | 137%
-j 6    3091 +  956 => 4047 | 2024 | 1400 | 145%
-j 8    3152 + 1004 => 4156 | 2078 | 1406 | 148%

A ratio of above 100% indicates that logical CPUs have contributed cycles to
user+system time. Instead of 2 real CPUs, there are 4 virtual CPUs, but of 
virtually reduced clock speed ;-)

I.e. if an optimized loop kept a single 'real' CPU completely busy, then
HT can't find any unused functional units to schedule instructions for the
second virtual processor on, and running 2 such loops on a Xeon with HT would
result in two logical CPUs of half the effective clock speed (just like two
compute bound processes running in parallel on a multi-tasking system).

This example of a tight loop running on a physical processor with 2 logical
CPUs should give the following results:

loops	usr	real   ratio
============================
1	  1	   1	100%
2	  4	   2	200%
4	  8	   4	200%

With 2 loops running in parallel, each one gets half the cycles, resulting in
twice the user time of the first case being reported per process.

There's nothing wrong with Hyper Threading slowing down one logical CPU, if
another one gains more, than the first one looses (as it should be ;-)
(The positive effect of HT seems to be the reduction of real time required
for "buildworld" from 1612s to some 1400s: a speed-up of 15%).

But it appears, that our user/system time accounting gives misleading results,
at least if you are used to expect your server CPU to run at a constant speed,
as long as the clock frequency remains constant ...


I do not understand, why 'ratio' comes out that different (92-103% vs. 137-148%)
for 5.2.1 vs. -CURRENT (for -j4 and up, where HT plays a role).

Is the process accounting different, or has the scheduler been changed to make
better use of logical processors ???


Regards, STefan
Received on Wed Mar 03 2004 - 13:12:09 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:45 UTC