Re: Heavy I/O blocks FreeBSD box for several seconds

From: Andriy Gapon <avg_at_FreeBSD.org> Date: Thu, 07 Jul 2011 10:13:31 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:15 UTC

on 06/07/2011 21:00 Steve Kargl said the following:
> On Wed, Jul 06, 2011 at 05:05:41PM +0000, Poul-Henning Kamp wrote:
>> In message <20110706170132.GA68775_at_troutmask.apl.washington.edu>, Steve Kargl w
>> rites:
>>
>>> I periodically ran the same type test in the 2008 post over the
>>> last three years.  Nothing has changed.  I even set up an account
>>> on one node in my cluster for jeffr to use.  He was too busy to
>>> investigate at that time.
>>
>> Isn't this just the lemming-syncer hurling every dirty block over
>> the cliff at the same time ?
> 
> I don't know the answer.  Of course, having no experience in
> processing scheduling, I don't understand the question either ;-)

I think that Poul-Henning was speaking in the vein of the subject line where I/O
is somehow involved.
I admit I would also love to hear more details in more technical terms (without
lemmings and cliffs) :-)

> AFAICT, it is a cpu affinity issue.  If I launch n+1 MPI images
> on a system with n cpus/cores, then 2 (and sometimes 3) images
> are stuck on a cpu and those 2 (or 3) images ping-pong on that
> cpu.  I recall trying to use renice(8) to force some load 
> balancing, but vaguely remember that it did not help.

Your issue seems to be about a specific case of purely CPU-bound loads.
It is very relevant to ULE, but perhaps not to this particular thread.

>> To find out:  Run gstat and keep and eye on the leftmost column
>>
>> The road map for fixing that has been known for years...

I would love to hear more about this.
A link to a past discussion, if any, would suffice.

-- 
Andriy Gapon