Re: panic, seems related to r234386

From: Mateusz Guzik <mjguzik_at_gmail.com> Date: Sun, 13 May 2012 00:49:38 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:26 UTC

On Thu, May 10, 2012 at 12:39:00PM +0200, Peter Holm wrote:
> On Thu, May 10, 2012 at 12:21:18PM +0200, Mateusz Guzik wrote:
> > On Tue, May 08, 2012 at 09:45:14PM +0200, Peter Holm wrote:
> > > On Mon, May 07, 2012 at 10:11:53PM +0200, Mateusz Guzik wrote:
> > > > On Mon, May 07, 2012 at 12:28:41PM -0700, Doug Barton wrote:
> > > > > On 05/06/2012 15:19, Sergey Kandaurov wrote:
> > > > > > On 7 May 2012 01:54, Doug Barton <dougb_at_freebsd.org> wrote:
> > > > > >> I got this with today's current, previous (working) kernel is r232719.
> > > > > >>
> > > > > >> panic: _mtx_lock_sleep: recursed on non-recursive mutex struct mount mtx
> > > > > >> _at_ /frontier/svn/head/sys/kern/vfs_subr.c:4595
> > > > > 
> > > > > ...
> > > > > 
> > > > > > Please try this patch.
> > > > > > 
> > > > > > Index: fs/ext2fs/ext2_vfsops.c
> > > > > > ===================================================================
> > > > > > --- fs/ext2fs/ext2_vfsops.c     (revision 235108)
> > > > > > +++ fs/ext2fs/ext2_vfsops.c     (working copy)
> > > > > > _at__at_ -830,7 +830,6 _at__at_
> > > > > >         /*
> > > > > >          * Write back each (modified) inode.
> > > > > >          */
> > > > > > -       MNT_ILOCK(mp);
> > > > > >  loop:
> > > > > >         MNT_VNODE_FOREACH_ALL(vp, mp, mvp) {
> > > > > >                 if (vp->v_type == VNON) {
> > > > > > 
> > > > > 
> > > > > Didn't help, sorry. I put 234385 through some pretty heavy load
> > > > > yesterday, and everything was fine. As soon as I move up to 234386, the
> > > > > panic triggered again. So I cleaned everything up, applied your patch,
> > > > > built a kernel from scratch, and rebooted. It was Ok for a few seconds
> > > > > after boot, then panic'ed again, I think in a different place, but I'm
> > > > > not sure because subsequent attempts to fsck the file systems caused new
> > > > > panics which overwrote the old ones before they could be saved.
> > > > > 
> > > > 
> > > > Another MNT_ILOCK was hiding few lines below, try this patch:
> > > > 
> > > > http://student.agh.edu.pl/~mjguzik/patches/ext2fs-ilock.patch
> > > > 
> > > > I've tested this a bit and I believe this fixes your problem.
> > > > 
> > > 
> > > Gave this a spin and found what looks like a deadlock:
> > > 
> > > http://people.freebsd.org/~pho/stress/log/ext2fs.txt
> > > 
> > > Not a new problem, it would seem. Same issue with 8.3-PRERELEASE r232656M.
> > > 
> > 
> > pid 2680 (fts) holds lock for vnode cb4be414 and tries to lock cc0ac15c
> > pid 2581 (openat) holds lock for vnode cc0ac15c and tries to lock cb4be414
> > 
> > openat calls rmdir foo/bar and ext2_rmdir unlocks and tries to lock
> > again foo's vnode.
> > 
> > This is fairly easly reproducible with concurrently running mkdir and fts
> > testcase programs that are provided by stress2.
> > 
> > I'll try to come up with a patch by the end of the week.
> > 
> 

Easier way to reproduce: mkdir from stress2 and "while true; do find /mnt >
/dev/null; done" on another terminal.

Assuming foo/bar directory tree, deadlock happens during removal of bar
with simultaneous lookup of .. in bar.

Proposed trivial patch:
http://student.agh.edu.pl/~mjguzik/patches/ext2fs_rmdir-deadlock.patch

If the lock cannot be acquired immediately unlocks 'bar' vnode and then
locks both vnodes in order.

After patching this I ran into another issue - wrong vnode type panics
from cache_enter_time after calls by ext2_lookup. (It takes some time to
reproduce this, testcase as before.)

It looks like ext2_lookup is actually adapted version of ufs_lookup and
lacks some bugfixes present in current ufs_lookup. I believe those
bugfixes address this bug.

Here is my attempt to fix the problem (based on ufs_lookup changes):
http://student.agh.edu.pl/~mjguzik/patches/ext2fs_lookup-relookup.patch

-- 
Mateusz Guzik <mjguzik gmail.com>