Re: More ULE bugs fixed.

From: Jeff Roberson <jroberson_at_chesapeake.net> Date: Fri, 31 Oct 2003 22:33:14 -0500 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:27 UTC

On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote:

> Jeff Roberson <jroberson_at_chesapeake.net> wrote:
>
> > On Wed, 29 Oct 2003, Jeff Roberson wrote:
> >
> > > On Thu, 30 Oct 2003, Bruce Evans wrote:
> > >
> > > > > Test for scheduling buildworlds:
> > > > >
> > > > > 	cd /usr/src/usr.bin
> > > > > 	for i in obj depend all
> > > > > 	do
> > > > > 		MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
> > > > > 	done >/tmp/zqz 2>&1
> > > > >
> > > > > (Run this with an empty /somewhere/obj.  The all stage doesn't
> > > > > quite finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz
> > > > > CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps
> > > > > ethernet and a reasonably fast server) and /somewhere/obj
> > > > > ufs1-mounted (on a fairly slow disk; no soft-updates), this
> > > > > gives the following times:
> > > > >
> > > > > SCHED_ULE-yesterday, with not so careful setup:
> > > > >        40.37 real         8.26 user         6.26 sys
> > > > >       278.90 real        59.35 user        41.32 sys
> > > > >       341.82 real       307.38 user        69.01 sys
> > > > > SCHED_ULE-today, run immediately after booting:
> > > > >        41.51 real         7.97 user         6.42 sys
> > > > >       306.64 real        59.66 user        40.68 sys
> > > > >       346.48 real       305.54 user        69.97 sys
> > > > > SCHED_4BSD-yesterday, with not so careful setup:
> > > > >       [same as today except the depend step was 10 seconds
> > > > >       slower (real)]
> > > > > SCHED_4BSD-today, run immediately after booting:
> > > > >        18.89 real         8.01 user         6.66 sys
> > > > >       128.17 real        58.33 user        43.61 sys
> > > > >       291.59 real       308.48 user        72.33 sys
> > > > > SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz
> > > > > CPU) with
> > > > >     many local changes and not so careful setup:
> > > > >        17.39 real         8.28 user         5.49 sys
> > > > >       130.51 real        60.97 user        34.63 sys
> > > > >       390.68 real       310.78 user        60.55 sys
> > > > >
> > > > > Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for
> > > > > the obj and depend stages.  These stages have little
> > > > > parallelism.  SCHED_ULE was only 19% slower for the all stage.
> > > > > ...
> > > >
> > > > I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
> > > > significant change.  However, with a UP kernel there was no
> > > > significant difference between the times for SCHED_ULE and
> > > > SCHED_4BSD.
> > >
> > > There was a significant difference on UP until last week.  I'm
> > > working on SMP now.  I have some patches but they aren't quite ready
> > > yet.
> >
> > I have commited my SMP fixes.  I would appreciate it if you could post
> > update results.  ULE now outperforms 4BSD in a single threaded kernel
> > compile and performs almost identically in a 16 way make.  I still
> > have a few more things that I can do to improve the situation.  I
> > would expect ULE to pull further ahead in the months to come.
>
> I recently had to complete a little piece of software in a course on
> parallel computing.  I've put it online[1] (we only had to write the
> pract2.cpp file).  It calculates the inverse of a Vandermonde matrix and
> allows you to spawn multiple slave-processes who each perform a part of
> the work.  Everything happens in memory so
> I've used it lately to test the different changes you made to
> sched_ule.c and these last fixes do improve the performance on my dual
> p3 machine a lot.
>
> Here are the results of my (very limited tests) :
>
> sched4bsd
> ---
> dimension       slaves          time
> 1000            1               90.925408
> 1000            2               58.897038
>
> 200             1               0.735962
> 200             2               0.676660
>
> sched_ule 1.68
> ---
> dimension       slaves          time
> 1000            1               90.951015
> 1000            2               70.402845
>
> 200             1               0.743551
> 200             2               1.900455
>
> sched_ule 1.70
> ---
> dimension       slaves          time
> 1000            1               90.782309
> 1000            2               57.207351
>
> 200             1               0.739998
> 200             2               0.383545
>
>
> I'm not really sure if this is very relevant to you, but from the
> end-user point of view (me :-)) this does means something.
> Thanks!

I welcome the feedback, positive or negative, as it helps me improve
things.  Thanks for the report!  Could you run this again under 4bsd and
ULE with the following in your .cshrc:

set time= ( 5 "%Uu %Ss %E %P %X+%Dk %I+%Oio %Fpf+%Ww %cc/%ww" )

And then time ./testpract 200 2, etc.  This will give me a few hints about
what's impacting your performance.

Thanks!
Jeff

>
> [1] <http://users.pandora.be/bomberboy/mptest/final.tar.bz2>
> It can be used by running testpract2 with two arguments, the dimension
> of the matrix and the number of slaves.  example './testpract2 200 2'
> will create a matrix with dimension 200 and 2 slaves.
>
>
> --
> Bruno
>
> ... And then there's the guy who bought 20,000 bras, cut them in half,
> and sold 40,000 yamalchas with chin straps....
>