Re: [patch] zfs livelock and thread priorities

From: Ben Kelly <ben_at_wanderview.com>
Date: Thu, 30 Apr 2009 11:55:10 -0400
On Apr 30, 2009, at 3:19 AM, Kip Macy wrote:
> I have a system at work that I could lock up within minutes with
> fsstress. With this patch the system is now stable with large numbers
> of fsstress processes running.
>
> Provided I get the heads up from pjd, I will commit it.

I found on my system that I could not zfs export my pool after running  
my load test with this patch.  To try to fix this I've updated the  
patch to delegate to vrele(9) instead of directly decrementing the  
vnode count.  I also modified the deferred operation to restart a full  
vrele(9) instead of calling VOP_INACTIVE since it occurred to me  
someone else might have grabbed the vnode while our task was on the  
queue.  I have only had time to run a short test, but it seems to  
avoid the problem so far.

Can you retest with these changes?

Thanks!

- Ben

>
> -Kip
>
>
> On Wed, Apr 29, 2009 at 6:56 PM, Ben Kelly <ben_at_wanderview.com> wrote:
>> On Apr 29, 2009, at 7:47 PM, Lawrence Stewart wrote:
>>>
>>> Ben Kelly wrote:
>>>>
>>>> On Apr 29, 2009, at 7:58 AM, Ben Kelly wrote:
>>>>>
>>>>> On Apr 29, 2009, at 2:43 AM, Jaakko Heinonen wrote:
>>>>>>
>>>>>> On 2009-04-28, Ben Kelly wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> http://www.wanderview.com/svn/public/misc/zfs/zfs_zinactive_deadlock.diff
>>>>>>>
>>>>>>> The patch is updated in the same location above.
>>>>>>
>>>>>> There's a fatal typo in the patch:
>>>>>>
>>>>>> -    ZFS_OBJ_HOLD_ENTER(zfsvfs, z_id);
>>>>>> +    locked == ZFS_OBJ_HOLD_TRYENTER(zfsvfs, z_id);
>>>>>>           ^^^^
>>>>>
>>>>> Yikes!  Thanks for catching this!
>>>>>
>>>>> The patch has been updated at the same URL.  If anyone has  
>>>>> patched their
>>>>> system please grab the new version.  Sorry for the confusion.
>>>>
>>>> Argh!  The patch was still broken even after this fix.
>>>> Apparently when I tested my taskqueue solution I forgot to do a  
>>>> make
>>>> installkernel.  For some reason the taskqueue approach deadlocks  
>>>> my server
>>>> at home under normal conditions.  Therefore I have reverted the  
>>>> patch to use
>>>> the simple return.  I still don't think this is the right  
>>>> solution, but I
>>>> don't have time to completely figure out what is going on right  
>>>> now.
>>>> Again, sorry for the mess!
>>>
>>> As far as I can tell, one of the developers is working on a patch to
>>> address the same issue you're discussing in this thread. He ran  
>>> into it on
>>> his SSD ZFS installation and the symptoms sound likely to be the  
>>> same as
>>> what you're discussing. I believe he's testing a patch which is  
>>> inspired by
>>> the one the opensolaris guys used to fix the bug, which you can  
>>> look at
>>> here:
>>>
>>> http://people.freebsd.org/~pjd/patches/vn_rele_hang.patch
>>>
>>> The open solaris one has major incompatibilities with FreeBSD so  
>>> can't be
>>> applied directly.
>>>
>>> As soon as it's ready I think he'll be making it available for wider
>>> testing so stay tuned.
>>>
>>> Cheers,
>>> Lawrence
>>>
>>> PS Apologies if the issue you're working on is not the same as the  
>>> one
>>> addressed by the opensolaris patch above.
>>
>>
>> Thank you!  This does appear to be the same issue and I look  
>> forward to
>> seeing the final fix.
>>
>> For now I've gone ahead and updated my patch with a naive  
>> adaptation of the
>> opensolaris diff.  It seems more correct than what I had and I was  
>> worried
>> people would waste time testing my broken approach.  I've only been  
>> able to
>> test it on my i386, non-SMP server however.
>>
>> Thanks again.
>>
>> - Ben
>> _______________________________________________
>> freebsd-current_at_freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org 
>> "
>>
>
>
>
> -- 
> All that is necessary for the triumph of evil is that good men do  
> nothing.
>    Edmund Burke
> <zfs_async_vrele.diff>
Received on Thu Apr 30 2009 - 13:55:13 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:46 UTC