Re: More ULE bugs fixed.

From: Bruno Van Den Bossche <bruno.van.den.bossche_at_pandora.be>
Date: Fri, 31 Oct 2003 14:30:56 +0100
Jeff Roberson <jroberson_at_chesapeake.net> wrote:

> On Wed, 29 Oct 2003, Jeff Roberson wrote:
> 
> > On Thu, 30 Oct 2003, Bruce Evans wrote:
> >
> > > > Test for scheduling buildworlds:
> > > >
> > > > 	cd /usr/src/usr.bin
> > > > 	for i in obj depend all
> > > > 	do
> > > > 		MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
> > > > 	done >/tmp/zqz 2>&1
> > > >
> > > > (Run this with an empty /somewhere/obj.  The all stage doesn't
> > > > quite finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz
> > > > CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps
> > > > ethernet and a reasonably fast server) and /somewhere/obj
> > > > ufs1-mounted (on a fairly slow disk; no soft-updates), this
> > > > gives the following times:
> > > >
> > > > SCHED_ULE-yesterday, with not so careful setup:
> > > >        40.37 real         8.26 user         6.26 sys
> > > >       278.90 real        59.35 user        41.32 sys
> > > >       341.82 real       307.38 user        69.01 sys
> > > > SCHED_ULE-today, run immediately after booting:
> > > >        41.51 real         7.97 user         6.42 sys
> > > >       306.64 real        59.66 user        40.68 sys
> > > >       346.48 real       305.54 user        69.97 sys
> > > > SCHED_4BSD-yesterday, with not so careful setup:
> > > >       [same as today except the depend step was 10 seconds
> > > >       slower (real)]
> > > > SCHED_4BSD-today, run immediately after booting:
> > > >        18.89 real         8.01 user         6.66 sys
> > > >       128.17 real        58.33 user        43.61 sys
> > > >       291.59 real       308.48 user        72.33 sys
> > > > SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz
> > > > CPU) with
> > > >     many local changes and not so careful setup:
> > > >        17.39 real         8.28 user         5.49 sys
> > > >       130.51 real        60.97 user        34.63 sys
> > > >       390.68 real       310.78 user        60.55 sys
> > > >
> > > > Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for
> > > > the obj and depend stages.  These stages have little
> > > > parallelism.  SCHED_ULE was only 19% slower for the all stage. 
> > > > ...
> > >
> > > I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
> > > significant change.  However, with a UP kernel there was no
> > > significant difference between the times for SCHED_ULE and
> > > SCHED_4BSD.
> >
> > There was a significant difference on UP until last week.  I'm
> > working on SMP now.  I have some patches but they aren't quite ready
> > yet.
> 
> I have commited my SMP fixes.  I would appreciate it if you could post
> update results.  ULE now outperforms 4BSD in a single threaded kernel
> compile and performs almost identically in a 16 way make.  I still
> have a few more things that I can do to improve the situation.  I
> would expect ULE to pull further ahead in the months to come.

I recently had to complete a little piece of software in a course on
parallel computing.  I've put it online[1] (we only had to write the
pract2.cpp file).  It calculates the inverse of a Vandermonde matrix and
allows you to spawn multiple slave-processes who each perform a part of
the work.  Everything happens in memory so 
I've used it lately to test the different changes you made to
sched_ule.c and these last fixes do improve the performance on my dual
p3 machine a lot.

Here are the results of my (very limited tests) :

sched4bsd
---
dimension       slaves          time
1000            1               90.925408
1000            2               58.897038

200             1               0.735962
200             2               0.676660

sched_ule 1.68
---
dimension       slaves          time
1000            1               90.951015
1000            2               70.402845

200             1               0.743551
200             2               1.900455

sched_ule 1.70
---
dimension       slaves          time
1000            1               90.782309
1000            2               57.207351

200             1               0.739998
200             2               0.383545


I'm not really sure if this is very relevant to you, but from the
end-user point of view (me :-)) this does means something.
Thanks!

[1] <http://users.pandora.be/bomberboy/mptest/final.tar.bz2>
It can be used by running testpract2 with two arguments, the dimension
of the matrix and the number of slaves.  example './testpract2 200 2'
will create a matrix with dimension 200 and 2 slaves.


-- 
Bruno

... And then there's the guy who bought 20,000 bras, cut them in half,
and sold 40,000 yamalchas with chin straps....
Received on Fri Oct 31 2003 - 04:31:10 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:27 UTC