Re: [patch] zfs livelock and thread priorities

From: Ben Kelly <ben_at_wanderview.com> Date: Tue, 28 Apr 2009 17:19:29 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:46 UTC

On Apr 28, 2009, at 4:52 PM, Ben Kelly wrote:

> On Apr 28, 2009, at 2:11 PM, Artem Belevich wrote:
>> My system had eventually deadlocked overnight, though it took much
>> longer than before to reach that point.
>>
>> In the end I've got many many processes sleeping in zio_wait with no
>> disk activity whatsoever.
>> I'm not sure if that's the same issue or not.
>>
>> Here are stack traces for all processes -- http://pastebin.com/f364e1452
>> I've got the core saved, so if you want me to dig out some more info,
>> let me know if/how I could help.
>
> It looks like there is a possible deadlock between zfs_zget() and  
> zfs_zinactive().  They both acquire a lock via  
> ZFS_OBJ_HOLD_ENTER().  The zfs_zinactive() path can get called  
> indirectly from within zio_done().  The zfs_zget() can in turn block  
> waiting for zio_done()'s completion while holding the object lock.
>
> The following patch might help:
>
>  http://www.wanderview.com/svn/public/misc/zfs/zfs_zinactive_deadlock.diff
>
> This simply bails out of the inactive processing if the object lock  
> is already held.  I'm not sure if this is 100% correct or not as it  
> cannot verify there are references to the vnode.  I also tried  
> executing the zfs_zinactive() logic in a taskqueue to avoid the  
> deadlock, but that caused other deadlocks to occur.

Sorry to reply to my own mail, but I came up with a better solution  
that I think is correct.  I just vref() the vnode and then vrele() it  
again from a taskqueue to restart the zfs_zinactive() processing if  
its still applicable.

The patch is updated in the same location above.

Thanks again.

- Ben