On May 19, 2009, at 5:40 AM, Attilio Rao wrote: > 2009/5/19 Ben Kelly <ben_at_wanderview.com>: >> On May 18, 2009, at 1:38 PM, Attilio Rao wrote: >>> >>> OMG. >>> This still doesn't explain priorities like 49 or such seen in the >>> first report as long as we don't set priorities by hand, >> >> I'm trying to understand why this particular priority value is so >> concerning, but I'm a little bit confused. Can you elaborate on >> why you >> think its a problem? From previous off-list e-mails I get the >> impression >> that you are concerned that it does not fall on an RQ_PPQ >> boundary. Is this >> the case? Again, I may be completely confused, but ULE does not >> seem to >> consider RQ_PPQ when it assigns priorities for interactive >> threads. Here is >> how I came to this conclusion: > > I'm concerned because the first starvation I saw in this thread was > caused by the proprity lowered inappropriately (it was 49 on 45 IIRC). > 49 means that the thread will never be choosen when the 45s are still > in the runqueue. I'm not concerned on RQ_PPQ boundaries. Ah, ok. Sorry for my confusion. I guess the condition seemed somewhat reasonable to me because the behavior of the 45s probably looks very interactive to the scheduler. The user threads wake up, see that there is no space in the arc, signal the txg threads, then sleep. The txg threads then wake up, see that the spa_zio threads are not done, signal all the user threads, then sleep. They bounce back and forth like this very quickly while waiting for data to be flushed to the disk. (On my system this can take a while since my backup pool is on a set of encrypted external USB drives.) It seems likely that their runtime and sleeptime values are balanced so the scheduler marks them as high priority interactive threads. So to me the interprocess communication within zfs appears to be somewhat brain damaged in low memory conditions, but I do not think it points to a problem in the scheduler. It seems that no matter what algorithm the scheduler uses to determine interactivity an application will be able to devise a perverse work load that will be misclassified. Anyway, that was my rough guestimate of what was happening. If you have time to do a more thorough analysis of the ktr dump that would be great. Thanks again for your help! - BenReceived on Tue May 19 2009 - 09:14:25 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:48 UTC