Re: kqueue LOR

From: Kostik Belousov <kostikbel_at_gmail.com>
Date: Tue, 12 Dec 2006 15:52:51 +0200
On Tue, Dec 12, 2006 at 11:49:42PM +1100, Bruce Evans wrote:
> On Tue, 12 Dec 2006, Kostik Belousov wrote:
> 
> >On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote:
> 
> >>Is the mount lock really required, if all we're doing is a single read of 
> >>a
> >>single word (mnt_kern_flags) (v_mount should be read-only for the whole
> >>lifetime of the vnode, I believe)? After all, reads of a single word are
> >>atomic on all our supported architectures.
> >>The only situation I see where there MIGHT be problems are forced 
> >>unmounts,
> >>but I think there are bigger issues with those.
> >>Sorry for noticing this email only now.
> >
> >The problem is real with snapshotting. Ignoring
> >MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of
> >mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode
> >inactivation time. This was the big trouble with nfsd and snapshots. As
> >such, I think that precise value of mmnt_kern_flag is critical there,
> >and mount interlock is needed.
> 
> Locking for just read is almost always bogus, but here (as in most
> cases) there is also a write based on the contents of the flag, and
> the lock is held across the write.
> 
> >Practically speaking, I agree with claim that reading of m_k_f is
> >surrounded by enough locked operations that would make sure that
> >the read value is not stale. But there is no such guarantee on
> >future/non-i386 arches, isn't it ?
> 
> I think not-very-staleness is implied by acquire/release semantics
> which are part of the API for most atomic operations.  This behaviour
> doesn't seem to be documented for mutexes, but I don't see how mutexes
> could work without it (they have to synchronize all memory accesses,
> not just the memory accessed by the lock).
> 
> >As a side note, mount interlock scope could be reduced there.
> >
> >Index: ufs/ufs/ufs_vnops.c
> >===================================================================
> >RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
> >retrieving revision 1.283
> >diff -u -r1.283 ufs_vnops.c
> >--- ufs/ufs/ufs_vnops.c	6 Nov 2006 13:42:09 -0000	1.283
> >+++ ufs/ufs/ufs_vnops.c	12 Dec 2006 10:18:04 -0000
> >_at__at_ -133,19 +134,19 _at__at_
> >{
> >	struct inode *ip;
> >	struct timespec ts;
> >-	int mnt_locked;
> >
> >	ip = VTOI(vp);
> >-	mnt_locked = 0;
> >	if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) {
> >		VI_LOCK(vp);
> >		goto out;
> >	}
> >	MNT_ILOCK(vp->v_mount);		/* For reading of mnt_kern_flags. */
> >-	mnt_locked = 1;
> >	VI_LOCK(vp);
> >-	if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
> >-		goto out_unl;
> >+	if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) {
> >+		MNT_IUNLOCK(vp->v_mount);
> >+		VI_UNLOCK(vp);
> >+		return;
> >+	}
> 
> The version that depends on not-very-staleness would test the flags
> without acquiring the lock(s) and return immediately in the usual case
> where none of the flags are set.  It would have to acquire the locks
> and repeat the test to make changes (and the test is already repeated
> one flag at a time).  I think this would be correct enough, but still
> inefficient and/or even messier.  The current organization is usually:
> 
>     acquire vnode interlock in caller
>     release vnode interlock in caller to avoid messes here (inefficient)
>     call
>         acquire mount interlock
>         acquire vnode interlock
>         test the flags; goto cleanup code if none set (usual case)
>         do the work
>         release vnode interlock
>         release mount interlock
>         return
>     acquire vnode interlock (if needed)
>     release vnode interlock (if needed)
> 
> and it might become:
> 
>     acquire vnode interlock in caller
>     call
>         test the flags; return if none set (usual case)
>         release vnode interlock	// check that callers are aware of 
>         this
>         acquire mount interlock
>         acquire vnode interlock
>         do the work
> 	// Assume no LOR problem for release, as below.
> 	// Otherwise need another relese+acquire of vnode interlock.
>         release mount interlock
>         return
>     release vnode interlock
> 
> >
> >	if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp))
> >		ip->i_flag |= IN_LAZYMOD;
> >_at__at_ -155,6 +156,7 _at__at_
> >		ip->i_flag |= IN_MODIFIED;
> >	else if (ip->i_flag & IN_ACCESS)
> >		ip->i_flag |= IN_LAZYACCESS;
> >+	MNT_IUNLOCK(vp->v_mount);
> >	vfs_timestamp(&ts);
> >	if (ip->i_flag & IN_ACCESS) {
> >		DIP_SET(ip, i_atime, ts.tv_sec);
> 
> Is there no LOR problem for release?
AFAIK, no. Release is guaranteed to success without blocking.

> 
> As I understand it, MNT_ILOCK() is only protecting IN_ACCESS being
> converted to IN_MODIFED, so after this conversion is done the lock
> is not needed.  Is this correct?
The code shall set IN_MODIFIED flag only if MNTK_SUSPEND and
MNTK_SUSPENDED are clean.

The vfs_write_suspend() that set MNTK_SUSPEND, does VFS_SYNC() after setting
the flag. Since VFS_SYNC() tries to interlock the vnode (to test the flags),
and then locks the vnode for ffs_syncvnode() call, the loop in ffs_sync()
cannot finish before the ufs_itimes released vnode interlock. The
MNTK_SUSPENDED is set after the loop.

Hmm, may be, since vnode must be interlocked by ffs_sync() after
MNTK_SUSPENDED set, and before MNTK_SUSPEND, mount interlock is not
needed in ufs_itimes.

Tor ?

> 
> >_at__at_ -172,10 +174,7 _at__at_
> >
> > out:
> >	ip->i_flag &= ~(IN_ACCESS | IN_CHANGE | IN_UPDATE);
> >- out_unl:
> >	VI_UNLOCK(vp);
> >-	if (mnt_locked)
> >-		MNT_IUNLOCK(vp->v_mount);
> >}
> >
> >/*
> >
> 
> BTW, vfs.lookup_shared defaults to 0 and decides shared access for all
> operations including read, so I wonder if there are [m]any bugs
> preventing shared accesses being the default for the most important
> cases where they can be (lookup and reads).  (As you know, not acquiring
> the vnode interlock in old versions of the above was one, and since
> vfs.lookup_shared is global and not all file systems have the fix, it
> still is one.)

Kris Kennaway said that lookup_shared caused deadlocks ?

Received on Tue Dec 12 2006 - 12:54:00 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:03 UTC