Re: recent changes in Giant usage -> panic

From: Bruce Evans <bde_at_zeta.org.au>
Date: Mon, 15 Mar 2004 01:52:00 +1100 (EST)
On Sun, 14 Mar 2004, Don Lewis wrote:

> I cvsup'ed a few hours ago and did the buildworld and buildkernel drill.
> I installed the new kernel, rebooted, and got this panic while mtree was
> running in installworld.
>
> --------------------------------------------------------------
> >>> Making hierarchy
> --------------------------------------------------------------
> cd /usr/src; /usr/obj/usr/src/make.i386/make -f Makefile.inc1 hierarchy
> cd /usr/src/etc;                /usr/obj/usr/src/make.i386/make distrib-dirs
> mtree -deU  -f /usr/src/etc/mtree/BSD.root.dist -p /
>
> panic: mutex Giant not owned at /usr/src/sys/kern/vfs_subr.c:899
> at line 719 in file /usr/src/sys/kern/kern_mutex.c
> cpuid = 0;
> Debugger("panic")
> Stopped at      Debugger+0x46:  xchgl   %ebx,in_Debugger.0
> db> tr
> Debugger(c07d2da1) at Debugger+0x46
> __panic(c07d217c,2cf,c07d22e8,c07d2423,c07daecf) at __panic+0x13d
> _mtx_assert(c0893860,1,c07daecf,383,c68931b0) at _mtx_assert+0xc2
> vinvalbuf(c6893104,1,0,c6872690,0,0) at vinvalbuf+0x25
> vclean(c6893104,8,c6872690,c6893104,c6893104) at vclean+0x97
> vgonel(c6893104,c6872690,c6893104,0,c07daecf) at vgonel+0x4d
> vgone(c6893104) at vgone+0x28
> pfs_exit(0,c6871898,c648c68c,0,c07d0695) at pfs_exit+0x3f
> exit1(c6872690,0,e7b11d40,c076bca7,c6872690) at exit1+0x2bd
> exit1(c6872690,e7b11d14,1,11,296) at exit1
> syscall(2f,2f,2f,4814a740,bfbfea48) at syscall+0x217
> Xint0x80_syscall() at Xint0x80_syscall+0x1d
> --- syscall (1, FreeBSD ELF32, sys_exit), eip = 0x480c5163, esp = 0xbfbfe9ac, ebp = 0xbfbfe9c8 ---

vgone() has interesting locking problems.  It seems to be a fundamentally
broken interface that only works if the kernel is not premptible or
possibly if everything related to vgone() is locked by Giant.  Apparently
the lower levels know that Giant locking is needed.

Look at this code in ufs_mknod9):

% 	/*
% 	 * Remove inode, then reload it through VFS_VGET so it is
% 	 * checked to see if it is an alias of an existing entry in
% 	 * the inode cache.
% 	 */
% 	vput(*vpp);

Here we've completely released the vnode, so references to it before
we vget() it again use a garbage pointer, since we may be preempted and
the vnode may be recycled.

% 	(*vpp)->v_type = VNON;

Here we reference it.

% 	ino = ip->i_number;	/* Save this before vgone() invalidates ip. */

Here we reference something hanging off it.

% 	vgone(*vpp);

Here we do considerably more with it, without holding any lock on it to
begin with.

% 	error = VFS_VGET(ap->a_dvp->v_mount, ino, LK_EXCLUSIVE, vpp);

This and the rest is OK.

% 	if (error) {
% 		*vpp = NULL;
% 		return (error);
% 	}
% 	return (0);
% this code in

I noticed this not due to losing a race, but because I have active
code in vput() that recycles certain unwanted vnodes as soon as possible
(actually sooner than possible due to the bug).  The vnode gets
vgone()'ed in vput() and comes back with type VBAD (so perhaps I'm
mistaken that it was completely free).  Then vgone()'ing it again does
bad things (deadlock in deadfs).  I work around this using:

%%%
Index: ufs_vnops.c
===================================================================
RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
retrieving revision 1.238
diff -u -2 -r1.238 ufs_vnops.c
--- ufs_vnops.c	11 Mar 2004 18:50:33 -0000	1.238
+++ ufs_vnops.c	12 Mar 2004 14:04:04 -0000
_at__at_ -246,7 +258,9 _at__at_
 	 */
 	vput(*vpp);
-	(*vpp)->v_type = VNON;
 	ino = ip->i_number;	/* Save this before vgone() invalidates ip. */
-	vgone(*vpp);
+	if ((*vpp)->v_type != VBAD) {
+		(*vpp)->v_type = VNON;
+		vgone(*vpp);
+	}
 	error = VFS_VGET(ap->a_dvp->v_mount, ino, LK_EXCLUSIVE, vpp);
 	if (error) {
%%%

Perhaps the correct fix is to remove vgone() and always use vgonel().
And Giant locking.

Bruce
Received on Sun Mar 14 2004 - 05:52:07 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:47 UTC