Re: [patch] zfs livelock and thread priorities

From: Kip Macy <kmacy_at_freebsd.org>
Date: Thu, 30 Apr 2009 00:19:24 -0700
I have a system at work that I could lock up within minutes with
fsstress. With this patch the system is now stable with large numbers
of fsstress processes running.

Provided I get the heads up from pjd, I will commit it.

-Kip


On Wed, Apr 29, 2009 at 6:56 PM, Ben Kelly <ben_at_wanderview.com> wrote:
> On Apr 29, 2009, at 7:47 PM, Lawrence Stewart wrote:
>>
>> Ben Kelly wrote:
>>>
>>> On Apr 29, 2009, at 7:58 AM, Ben Kelly wrote:
>>>>
>>>> On Apr 29, 2009, at 2:43 AM, Jaakko Heinonen wrote:
>>>>>
>>>>> On 2009-04-28, Ben Kelly wrote:
>>>>>>>
>>>>>>>
>>>>>>> http://www.wanderview.com/svn/public/misc/zfs/zfs_zinactive_deadlock.diff
>>>>>>
>>>>>> The patch is updated in the same location above.
>>>>>
>>>>> There's a fatal typo in the patch:
>>>>>
>>>>> -    ZFS_OBJ_HOLD_ENTER(zfsvfs, z_id);
>>>>> +    locked == ZFS_OBJ_HOLD_TRYENTER(zfsvfs, z_id);
>>>>>           ^^^^
>>>>
>>>> Yikes!  Thanks for catching this!
>>>>
>>>> The patch has been updated at the same URL.  If anyone has patched their
>>>> system please grab the new version.  Sorry for the confusion.
>>>
>>> Argh!  The patch was still broken even after this fix.
>>> Apparently when I tested my taskqueue solution I forgot to do a make
>>> installkernel.  For some reason the taskqueue approach deadlocks my server
>>> at home under normal conditions.  Therefore I have reverted the patch to use
>>> the simple return.  I still don't think this is the right solution, but I
>>> don't have time to completely figure out what is going on right now.
>>> Again, sorry for the mess!
>>
>> As far as I can tell, one of the developers is working on a patch to
>> address the same issue you're discussing in this thread. He ran into it on
>> his SSD ZFS installation and the symptoms sound likely to be the same as
>> what you're discussing. I believe he's testing a patch which is inspired by
>> the one the opensolaris guys used to fix the bug, which you can look at
>> here:
>>
>> http://people.freebsd.org/~pjd/patches/vn_rele_hang.patch
>>
>> The open solaris one has major incompatibilities with FreeBSD so can't be
>> applied directly.
>>
>> As soon as it's ready I think he'll be making it available for wider
>> testing so stay tuned.
>>
>> Cheers,
>> Lawrence
>>
>> PS Apologies if the issue you're working on is not the same as the one
>> addressed by the opensolaris patch above.
>
>
> Thank you!  This does appear to be the same issue and I look forward to
> seeing the final fix.
>
> For now I've gone ahead and updated my patch with a naive adaptation of the
> opensolaris diff.  It seems more correct than what I had and I was worried
> people would waste time testing my broken approach.  I've only been able to
> test it on my i386, non-SMP server however.
>
> Thanks again.
>
> - Ben
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>



-- 
All that is necessary for the triumph of evil is that good men do nothing.
    Edmund Burke

Received on Thu Apr 30 2009 - 05:42:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:46 UTC