Re: Improved multiprocessor usage on amd64

From: Stephen Montgomery-Smith <stephen_at_math.missouri.edu>
Date: Wed, 17 Sep 2008 16:24:20 -0500
Dan Nelson wrote:
> In the last episode (Sep 15), Stephen Montgomery-Smith said:
>> Stephen Montgomery-Smith wrote:
>>> Steve Kargl wrote:
>>>> On Mon, Sep 15, 2008 at 07:36:04PM -0500, Stephen Montgomery-Smith wrote:
>>>>> ... and each thread is a loop of the form
>>>>>
>>>>> while (1) {
>>>>>   wait until told to start;
>>>>>   do massive amounts of floating point arithmetic (only additions and
>>>>> multiplications) on large arrays;
>>>>>   tell the master process that you are done;
>>>>> }
>>>>>
>>>>>> Do you have about as many threads as processor or more?
>>>>> Both ways.  The time difference between the two approaches is 
>>>>> negligible.
>>>>>
>>>> Are you using ULE?  With my MPI applications, if the number of
>>>> launched processes exceeds the number of cpus by 1, ULE falls
>>>> through the floor.  I have a nagging feeling that there is a problem 
>>>> with cpu affinity.
>>>>
>>>> http://lists.freebsd.org/pipermail/freebsd-current/2008-July/086917.html
>> Let me say a little bit more.
>>
>> I have this gut feeling that the problem has a lot to do with cache 
>> management.  My program has each thread doing, in effect, huge matrix 
>> multiplications, each one working on their own little bit.  If a CPU 
>> core changes from one thread to another, it then has to flush out the 
>> cache to RAM, and read in a whole bunch of other RAM into cache.
> 
> You can try playing with the new cpuset functions in HEAD and 7-STABLE
> to lock particular threads on certain CPUs.
> 

It was an excellent suggestion.  But it didn't make any difference.
Received on Wed Sep 17 2008 - 19:24:34 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:35 UTC