Re: Heavy I/O blocks FreeBSD box for several seconds

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Thu, 07 Jul 2011 10:27:53 +0300
on 06/07/2011 21:11 Nathan Whitehorn said the following:
> On 07/06/11 13:00, Steve Kargl wrote:
>> AFAICT, it is a cpu affinity issue.  If I launch n+1 MPI images
>> on a system with n cpus/cores, then 2 (and sometimes 3) images
>> are stuck on a cpu and those 2 (or 3) images ping-pong on that
>> cpu.  I recall trying to use renice(8) to force some load
>> balancing, but vaguely remember that it did not help.
> 
> I've seen exactly this problem with multi-threaded math libraries, as well.

Exactly the same?  Let's see.

> Using parallel GotoBLAS on FreeBSD gives terrible performance because the
> threads keep migrating between CPUs, causing frequent cache misses.

So Steve reports that if he has Nthr > Ncpu, then some threads are "over-glued"
to a particular CPU, which results in sub-optimal scheduling for those threads.
 I have to guess that Steve would want to see the threads being shuffled between
CPUs to produce more even CPU load.

On the other hand, you report that your threads keep being shuffled between CPUs
(I presume for Nthr == Ncpu case, where Nthr is a count of the number-crunching
threads).  And I guess that you want them to stay glued to particular CPUs.

So how is this the same problem?  In fact, it sounds like somewhat opposite.
The only thing in common is that you both don't like how ULE works.

ULE has many knobs to tune its behavior.  Unfortunately they are not very well
documented and there are too many of them.  So, it's not easy to find which
combination would be the best for a particular work-load.  In your particular
case you might want to try to increase value of kern.sched.affinity to increase
affinity of threads to their CPUs.

Also, please note that FreeBSD support in GotoBLAS is not equivalent to Linux
support as I have pointed out before.  On Linux they bind their threads to CPUs
to avoid the situation that you describe.  Apparently they didn't know how to do
CPU-binding on FreeBSD, so this is not implemented.  You may have a motivation
to help them out with this.

-- 
Andriy Gapon
Received on Thu Jul 07 2011 - 05:27:56 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:15 UTC