on 06/07/2011 21:11 Nathan Whitehorn said the following: > On 07/06/11 13:00, Steve Kargl wrote: >> AFAICT, it is a cpu affinity issue. If I launch n+1 MPI images >> on a system with n cpus/cores, then 2 (and sometimes 3) images >> are stuck on a cpu and those 2 (or 3) images ping-pong on that >> cpu. I recall trying to use renice(8) to force some load >> balancing, but vaguely remember that it did not help. > > I've seen exactly this problem with multi-threaded math libraries, as well. Exactly the same? Let's see. > Using parallel GotoBLAS on FreeBSD gives terrible performance because the > threads keep migrating between CPUs, causing frequent cache misses. So Steve reports that if he has Nthr > Ncpu, then some threads are "over-glued" to a particular CPU, which results in sub-optimal scheduling for those threads. I have to guess that Steve would want to see the threads being shuffled between CPUs to produce more even CPU load. On the other hand, you report that your threads keep being shuffled between CPUs (I presume for Nthr == Ncpu case, where Nthr is a count of the number-crunching threads). And I guess that you want them to stay glued to particular CPUs. So how is this the same problem? In fact, it sounds like somewhat opposite. The only thing in common is that you both don't like how ULE works. ULE has many knobs to tune its behavior. Unfortunately they are not very well documented and there are too many of them. So, it's not easy to find which combination would be the best for a particular work-load. In your particular case you might want to try to increase value of kern.sched.affinity to increase affinity of threads to their CPUs. Also, please note that FreeBSD support in GotoBLAS is not equivalent to Linux support as I have pointed out before. On Linux they bind their threads to CPUs to avoid the situation that you describe. Apparently they didn't know how to do CPU-binding on FreeBSD, so this is not implemented. You may have a motivation to help them out with this. -- Andriy GaponReceived on Thu Jul 07 2011 - 05:27:56 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:15 UTC