Re: 40% slowdown with dynamic /bin/sh

From: Bruce Evans <bde_at_zeta.org.au>
Date: Thu, 27 Nov 2003 18:27:47 +1100 (EST)
On Wed, 26 Nov 2003, Garance A Drosihn wrote:

> At 12:23 AM -0500 11/26/03, Michael Edenfield wrote:
> >
> >Just to provide some real-world numbers, here's what I got
> >out of a buildworld:
>
> I have reformatted the numbers that Michael reported,
> into the following table:
>
> >Static /bin/sh:         Dynamic /bin/sh:
> >   real    385m29.977s     real    455m44.852s   => 18.22%
> >   user    111m58.508s     user    113m17.807s   =>  1.18%
> >   sys      93m14.450s     sys     103m16.509s   => 10.76%
> >                           user+sys              =>  5.53%

What are people doing to make buildworld so slow?  I once optimized
makeworld to take 75 minutes on a K6-233 with 64MB of RAM.  Things
have been pessimized a bit since then, but not signifcantly except for
the 100% slowdown of gcc (we now build large things like secure but
this is partly compensated for by not building large things like perl).
Michael's K7-500 with 320MB (?) of RAM should be serveral times faster
than the K6-233, so I would be unhappy if it took more than 75 minutes
but would expect it to take bit more than 2 hours when well configured.

> Here are some buildworld numbers of my own, from my system.
> In my case, I am running on a single Athlon MP2000, with a
> gig of memory.  It does a buildworld without paging to disk.

I have a similar configuration, except with a single Athlon XP1600
overclocked by 146/133 and I always benchmark full makeworlds.  I
was unhappy when the gcc pessimizations between gcc-2.95 and gcc-3.0
increased the makeworld time from about 24 minutes to about 33 minutes.
The time has since increased to about 38 minutes.  The latter is
cheating slightly -- I leave out the DYNAMICROOT and RESCUE mistakes
and the KERBEROS non-mistake.

> Static sh, No -j:      Dynamic sh, No -j:
>    real    84m31.366s     real    86m22.429s   =>  2.04%
>    user    50m33.013s     user    51m13.080s   =>  1.32%
>    sys     29m59.047s     sys     33m04.082s   => 10.29%
>                           user+sys             =>  4.66%
>
> Static sh, -j2:        Dynamic sh, -j2:
>    real    92m38.656s     real    95m21.027s   =>  2.92%
>    user    51m48.970s     user    52m29.152s   =>  1.29%
>    sys     32m07.293s     sys     34m40.595s   =>  7.95%
>                           user+sys             =>  3.84%

This also shows why -j should not be used on non-SMP machines.  Apart
from the make -j bug that causes missed opportunties to run a job,
make -j increases real and user times due to competition for resources,
so it can only possibly help on systems where unbalanced resources (mainly
slow disks) give too much idle time.

My current worst makeworld time is almost twice as small as the fastest
buildworld time in the above (2788 seconds vs 5071 seconds).  From my
collection of makeworld benchmarks:

%%%
Fastest makeworld on a Celeron 366 overclocked by 95/66 (2000/05/15):
    3309.30 real      2443.75 user       488.68 sys

Last makeworld on a Celeron 366 overclocked by 95/66 (2001/11/19):
    4219.83 real      3253.04 user       667.64 sys

Fastest makeworld on an Athlon XP1600 overclocked by 146/133 (2002/01/03):
    1390.18 real       913.56 user       232.63 sys

Last makeworld before gcc-3 on an Athlon XP1600 o/c by 143/133 (2002/05/09)
(overclocking reduced and due to memory problems and some local
memory-related optimizations turned off):
     1532.99 real      1093.08 user       293.15 sys

Early makeworld with gcc-3 on an Athlon XP1600 o/c by 143/133 (2002/05/12):
    2268.13 real      1613.25 user       313.56 sys

Fastest makeworld with gcc-3 an Athlon XP1600 overclocked by 146/133
(maximal overclocking recovered; memory increased from 512MB to 1GB, local
memory-related optimizations turned on and tuned) (2003/03/31):
    1929.02 real      1576.67 user       205.30 sys

Last makeworld before <the default bloat became too large for me and I
started stopping it for me by putting things like NO_KERBEROS in
/etc/make.conf> on an Athlon XP1600 o/c by 143/133 (2003/04/29:
    2012.75 real      1637.59 user       225.07 sys

Makeworld with the defaults (no /etc/make.conf and no local optimizations
in the src tree; mainly no pessimizing for Athlons by optimizing for PII's,
and no building dependencies; only optimizations in the host environment
(mainly no dynamic linkage) on an Athlon as usual (2003/05/06):

Last recorded makeworld with local source and make.conf optimizations
(mainly no dynamic linkage) on an Athlon as usual (2003/10/22):
    2225.83 real      1890.64 user       256.33 sys

Last recorded makeworld with the defaults on an Athlon as usual (2003/11/11):
    2788.41 real      2316.49 user       357.34 sys
%%%

I don't see such a large slowdown from using a dynamic /bin/sh.  Unrecorded
runs of makeworld gave times like the following:

    2262 real ... with local opts including src ones and no dynamic linkage
    2290 real ... with same except for /bin/sh (only) dynamically linked

The difference may be because my /usr/bin/true and similar utilities remain
statically linked.  Fork-exec expense depends mor on the exec than the fork.
>From an old benchmark for fork-exec of tiny programs:

%%%
st = statically linked
sh = dynamically linked
The numbers are the real, user and system times (using a real time(1)).

K6-233
------
st-st	 0.93	 0.01	 0.91
sh-st	 1.75	 0.02	 0.70
st-sh	 3.94	 0.70	 3.20
sh-sh	 5.14	 1.08	 4.03
%%%

> Buildworld, static, with no '-j',
>                   executed /bin/sh  32,308 times.
>
> Buildworld, static, with '-j2',
>                   executed /bin/sh  32,802 times.

Turning on accounting must have pessimized things a bit.  I think you
are also using a pessimized kernel (with INVARIANTS and WITNESS).
makeworld times should be dominated by the gcc hog, but your sys times
are almost as large as your user times.

The small 1% pessimization for my world and Warner's world is only
small because gcc is so slow.

As John Dyson said, even macro-benchmarks like makewold can provided
numbers that are hard to interpret.  My system is fairly well balanced,
so the idle time is fairly small, but it is still large enough for
lots of useful zeroing of pages to be done in the idle thread.  Other
measurements show that the idle thread used to take about 60 seconds
(almost 3% of the makeworld time), but I optimized it to take about
30 seconds.  If idle zeroing is turned off, then the real time for
makeworld doesn't change much but the system time increases by approx.
the same time that the idlezero thread took, provided there are enough
idle cycles.  Dynamic linkage is quite likely to disturb these times
by requiring more zero pages.

> On all attempts, I started out by doing:
>      rm -Rf /usr/obj/usr/src/*
>      sync ; sleep 1 ; sync ; sleep 1 ; sync
>
> before doing the 'make' command.  I usually start up a 'script'

I use:

	# /c async mounted
	cd /c/z || exit 1
	rm -rf obj/* root/*
	chflags -R noschg obj root
	rm -rf obj/* root/*
	reboot ...
	# Sometimes: export __MAKE_CONF=/etc/nonesuch
	cd /wherever/src || exit 1
	DESTDIR=/c/z/root \
	MAKEOBJDIRPREFIX=/c/z/obj \
	time -l make -s world > /tmp/world.out 2>&1

Rebooting doesn't affect the times much in relative terms (it
minimizes them, short of the optimization of prefetching /usr/src),
but it reduces the variance to less than a second provided the
system is mostly idle.

> Aside: building 5.1-"security" on this same hardware took
> the following times:
>    real    54m10.092s   [  71.03% ]
>    user    41m39.121s   [  24.40% ]
>    sys     10m20.325s   [ 210.69% ]
>
> And those times *are* with 'script' running, as well as a
> perl-script which I use to summarize "interesting" data from
> the output of a buildworld.  So, those times include extra
> overhead which is not included in the above buildworlds.
> That's from a 'make -j3', and obviously has a static /bin/sh.

Why so much faster?  Now the times are only 20% larger than mine,

> So, if you take that as the base, then the buildworld for
> 5.2-release (using *static* /bin/sh and -j2) will see the
> performance hits that I put in brackets.  That probably seems
> like a pretty horrifying hit, but remember that 5.1-release
> did *not* build /rescue at all (not for me at least :-), and
> that is probably a significant part of the increase.

Builing rescue only accounts for about 2 minutes of the 86-54
difference.

> For those who think I'm spoiled by fast hardware, please note
> that all of the above has been done while doing just two
> buildworlds and one buildkernel+installkernel on my sparc64
> box (and that second buildworld is not done yet...).  So I
> certainly am interested in how freebsd runs on "slower HW"!

Single Athlon 1600-2000's are slow hardware :-).  I plan to upgrade
to an Athlon 2800 soon, but expect to be unhappy that this doesn't
recover compile-time performace lost to gcc pessimisations.  Moor's
law seem to be hitting physical limits for CPU, so software bloat is
now outrunning hardware improvements.

Bruce
Received on Wed Nov 26 2003 - 22:28:06 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:31 UTC