Re: [patch] zfs livelock and thread priorities

From: Ben Kelly <ben_at_wanderview.com>
Date: Thu, 16 Apr 2009 21:30:21 -0400
On Apr 15, 2009, at 12:35 AM, Artem Belevich wrote:
> I'll give it a try in a few days. I'll let you know how it went.

Just FYI, I was able to reproduce some of the failures with the  
original patch using an SMP vmware image.  The new patch seems to fix  
these problems and I was able to successfully mount a zfs pool.

> BTW, now that you're tinkering with ZFS threads and priorities, whould
> you by any chance have any idea why zfs scrub is so painfully slow on
> -current?
> When I start scrub on my -stable box, it pretty much runs full speed
> -- I can see disks under load all the time.
> However on -current scrub seems to run in small bursts. Disks get busy
> for a second or so and then things get quiet for about five seconds or
> so and this pattern repeats over and over.

I don't know.  I haven't had to scrub my devices very often.  I ran a  
couple here locally and did not see the behavior you describe.  There  
is a significant delay between typing zpool scrub and when it actually  
begins disk I/O, but after that it completes without pause.  If I get  
a chance I'll try to look at what the scrub code is doing.

Thanks again.

- Ben

> --Artem
>
>
>
> On Tue, Apr 14, 2009 at 7:32 PM, Ben Kelly <ben_at_wanderview.com> wrote:
>> On Apr 14, 2009, at 11:50 AM, Ben Kelly wrote:
>>>
>>> On Apr 13, 2009, at 7:36 PM, Artem Belevich wrote:
>>>>
>>>> Tried your patch that used PRIBIO+{1,2} for priorities with - 
>>>> current
>>>> r191008 and the kernel died with "spinlock held too long" panic.
>>>> Actually, there apparently were two instances of panic on different
>>>> cores..
>>>>
>>>> Here's output of "alltrace" and "ps" after the crash:
>>>> http://pastebin.com/f140f4596
>>>>
>>>> I've reverted the change and kernel booted just fine.
>>>>
>>>> The box is quad-core with two ZFS pools -- one single-disk and  
>>>> another
>>>> one is a two-disk mirror. Freebsd is installed on UFS partitions,  
>>>> ZFS
>>>> is used for user stuff only.
>>>
>>> Thanks for the report!
>>>
>>> I don't have a lot of time to look at this today, but it appears  
>>> that
>>> there is a race condition on SMP machines when setting the priority
>>> immediately after the kproc is spawned.  As a quick hack I tried  
>>> adding a
>>> pause between the kproc_create() and the sched_prio().  Can you  
>>> try this
>>> patch?
>>>
>>>
>>>  http://www.wanderview.com/svn/public/misc/zfs_livelock/zfs_thread_priority.diff
>>>
>>> I'll try to take a closer look at this later in the week.
>>
>> Sorry for replying to my own e-mail, but I've updated the patch  
>> again with a
>> less hackish approach.  (At the same URL above.)  I added a new
>> kproc_create_priority() function to set the priority of the new  
>> thread
>> before its first scheduled.  This should avoid any SMP races with  
>> setting
>> the priority from an external thread.
>>
>> If you would be willing to try the test again with this new patch I  
>> would
>> appreciate it.
>>
>> Thanks!
>>
>> - Ben
>>
Received on Thu Apr 16 2009 - 23:30:24 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:46 UTC