Re: More ULE bugs fixed.

From: Bruno Van Den Bossche <bruno.van.den.bossche_at_pandora.be>
Date: Sun, 2 Nov 2003 23:52:08 +0100
Jeff Roberson <jroberson_at_chesapeake.net> wrote:

> On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote:
[...]
> > I recently had to complete a little piece of software in a course on
> > parallel computing.  I've put it online[1] (we only had to write the
> > pract2.cpp file).  It calculates the inverse of a Vandermonde matrix and
> > allows you to spawn multiple slave-processes who each perform a part of
> > the work.  Everything happens in memory so
> > I've used it lately to test the different changes you made to
> > sched_ule.c and these last fixes do improve the performance on my dual
> > p3 machine a lot.
> >
> > Here are the results of my (very limited tests) :
> >
> > sched4bsd
> > ---
> > dimension       slaves          time
> > 1000            1               90.925408
> > 1000            2               58.897038
> >
> > 200             1               0.735962
> > 200             2               0.676660
> >
> > sched_ule 1.68
> > ---
> > dimension       slaves          time
> > 1000            1               90.951015
> > 1000            2               70.402845
> >
> > 200             1               0.743551
> > 200             2               1.900455
> >
> > sched_ule 1.70
> > ---
> > dimension       slaves          time
> > 1000            1               90.782309
> > 1000            2               57.207351
> >
> > 200             1               0.739998
> > 200             2               0.383545
> >
> >
> > I'm not really sure if this is very relevant to you, but from the
> > end-user point of view (me :-)) this does means something.
> > Thanks!
> 
> I welcome the feedback, positive or negative, as it helps me improve
> things.  Thanks for the report!  Could you run this again under 4bsd and
> ULE with the following in your .cshrc:
> 
> set time= ( 5 "%Uu %Ss %E %P %X+%Dk %I+%Oio %Fpf+%Ww %cc/%ww" )
> 
> And then time ./testpract 200 2, etc.  This will give me a few hints about
> what's impacting your performance.

The program can run as a slave or master.  So one should run one master and multiple slaves and they all work on a piece of shared memory.  So I've timed the individual processes, as the wrapper-script test_pract2 doesn't do more then launch a few processes in the background.  I don't think the output of that is very relevant.

Here's the result:

sched_4bsd 1.26

1000            1
master: 49.172u 0.187s 2:21.54 34.8% 15+10182k 0+0io 0pf+0w 5962c/65w
slave : 90.326u 0.250s 1:30.75 99.8% 15+168k 0+0io 0pf+0w 9156c/35w

1000            2
master: 49.113u 0.226s 1:49.94 44.8% 15+10181k 0+0io 0pf+0w 5942c/63w
slave1: 55.211u 0.326s 0:59.11 93.9% 15+166k 0+0io 0pf+0w 11129c/2224w
slave2: 54.897u 0.363s 0:58.62 94.2% 15+167k 0+0io 0pf+0w 7111c/6129w

200             1
master: 0.377u 0.007s 0:02.39 15.4% 15+589k 0+0io 0pf+0w 38c/13w
slave : 0.711u 0.031s 0:00.74 100.0% 15+169k 0+0io 0pf+0w 85c/1w

200             2
master: 0.376u 0.007s 0:02.87 12.8% 16+602k 0+0io 0pf+0w 41c/11w
slave1: 0.388u 0.006s 0:01.03 36.8% 18+201k 0+0io 0pf+0w 1245c/408w
slave2: 0.345u 0.038s 0:00.68 54.4% 34+158k 0+0io 0pf+0w 432c/1215w


sched_ule 1.75

1000            1
master: 49.097u 0.163s 2:21.32 34.8% 15+10186k 0+0io 0pf+0w 6197c/163w
slave : 90.157u 0.398s 1:30.82 99.6% 15+168k 0+0io 0pf+0w 11568c/49w

1000            2
master: 49.132u 0.164s 1:48.15 45.5% 15+10155k 0+0io 0pf+0w 6517c/276w
slave1: 55.634u 0.406s 0:57.52 97.4% 15+169k 0+0io 0pf+0w 12745c/9628w
slave2: 55.416u 0.391s 0:57.13 97.6% 15+168k 0+0io 0pf+0w 12448c/10063w

200             1
master: 0.369u 0.016s 0:02.52 14.6% 15+577k 0+0io 0pf+0w 92c/35w
slave : 0.690u 0.054s 0:00.74 100.0% 15+171k 0+0io 0pf+0w 147c/13w

200             2
master: 0.376u 0.007s 0:02.47 14.9% 15+589k 0+0io 0pf+0w 87c/21w
slave1: 0.331u 0.023s 0:00.70 50.0% 15+173k 0+0io 0pf+0w 466c/2135w
slave2: 0.304u 0.040s 0:00.39 87.1% 15+166k 0+0io 0pf+0w 412c/2119w

> >
> > [1] <http://users.pandora.be/bomberboy/mptest/final.tar.bz2>
> > It can be used by running testpract2 with two arguments, the dimension
> > of the matrix and the number of slaves.  example './testpract2 200 2'
> > will create a matrix with dimension 200 and 2 slaves.

-- 
Bruno

This fortune is inoperative.  Please try another.
Received on Sun Nov 02 2003 - 13:52:14 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:27 UTC