Quoting Ben Kelly <ben_at_wanderview.com> (from Tue, 28 Apr 2009 17:19:29 -0400): > > On Apr 28, 2009, at 4:52 PM, Ben Kelly wrote: > >> On Apr 28, 2009, at 2:11 PM, Artem Belevich wrote: >>> My system had eventually deadlocked overnight, though it took much >>> longer than before to reach that point. >>> >>> In the end I've got many many processes sleeping in zio_wait with no >>> disk activity whatsoever. >>> I'm not sure if that's the same issue or not. >>> >>> Here are stack traces for all processes -- http://pastebin.com/f364e1452 >>> I've got the core saved, so if you want me to dig out some more info, >>> let me know if/how I could help. >> >> It looks like there is a possible deadlock between zfs_zget() and >> zfs_zinactive(). They both acquire a lock via >> ZFS_OBJ_HOLD_ENTER(). The zfs_zinactive() path can get called >> indirectly from within zio_done(). The zfs_zget() can in turn >> block waiting for zio_done()'s completion while holding the object >> lock. >> >> The following patch might help: >> >> http://www.wanderview.com/svn/public/misc/zfs/zfs_zinactive_deadlock.diff >> >> This simply bails out of the inactive processing if the object lock >> is already held. I'm not sure if this is 100% correct or not as it >> cannot verify there are references to the vnode. I also tried >> executing the zfs_zinactive() logic in a taskqueue to avoid the >> deadlock, but that caused other deadlocks to occur. > > Sorry to reply to my own mail, but I came up with a better solution > that I think is correct. I just vref() the vnode and then vrele() > it again from a taskqueue to restart the zfs_zinactive() processing > if its still applicable. This sounds a little bit related to the issues we discussed in the unlimited arc cache growth thread. Maybe the high value for the arc cache was a red herring and this is the real problem for the panics / watchdog triggers I experience on the system in question. I'm preparing a kernel with this patch and your zfs-prio patch, but I don't think I can fully test it this week. If I'm lucky I can install the new kernel, but I don't think I can put load on the system this week. Bye, Alexander. -- The length of a marriage is inversely proportional to the amount spent on the wedding. http://www.Leidinger.net Alexander _at_ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild _at_ FreeBSD.org : PGP ID = 72077137Received on Wed Apr 29 2009 - 06:44:41 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:46 UTC