On Tue, Dec 12, 2006 at 11:49:42PM +1100, Bruce Evans wrote: > On Tue, 12 Dec 2006, Kostik Belousov wrote: > > >On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote: > > >>Is the mount lock really required, if all we're doing is a single read of > >>a > >>single word (mnt_kern_flags) (v_mount should be read-only for the whole > >>lifetime of the vnode, I believe)? After all, reads of a single word are > >>atomic on all our supported architectures. > >>The only situation I see where there MIGHT be problems are forced > >>unmounts, > >>but I think there are bigger issues with those. > >>Sorry for noticing this email only now. > > > >The problem is real with snapshotting. Ignoring > >MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of > >mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode > >inactivation time. This was the big trouble with nfsd and snapshots. As > >such, I think that precise value of mmnt_kern_flag is critical there, > >and mount interlock is needed. > > Locking for just read is almost always bogus, but here (as in most > cases) there is also a write based on the contents of the flag, and > the lock is held across the write. > > >Practically speaking, I agree with claim that reading of m_k_f is > >surrounded by enough locked operations that would make sure that > >the read value is not stale. But there is no such guarantee on > >future/non-i386 arches, isn't it ? > > I think not-very-staleness is implied by acquire/release semantics > which are part of the API for most atomic operations. This behaviour > doesn't seem to be documented for mutexes, but I don't see how mutexes > could work without it (they have to synchronize all memory accesses, > not just the memory accessed by the lock). > > >As a side note, mount interlock scope could be reduced there. > > > >Index: ufs/ufs/ufs_vnops.c > >=================================================================== > >RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v > >retrieving revision 1.283 > >diff -u -r1.283 ufs_vnops.c > >--- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 -0000 1.283 > >+++ ufs/ufs/ufs_vnops.c 12 Dec 2006 10:18:04 -0000 > >_at__at_ -133,19 +134,19 _at__at_ > >{ > > struct inode *ip; > > struct timespec ts; > >- int mnt_locked; > > > > ip = VTOI(vp); > >- mnt_locked = 0; > > if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) { > > VI_LOCK(vp); > > goto out; > > } > > MNT_ILOCK(vp->v_mount); /* For reading of mnt_kern_flags. */ > >- mnt_locked = 1; > > VI_LOCK(vp); > >- if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) > >- goto out_unl; > >+ if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) { > >+ MNT_IUNLOCK(vp->v_mount); > >+ VI_UNLOCK(vp); > >+ return; > >+ } > > The version that depends on not-very-staleness would test the flags > without acquiring the lock(s) and return immediately in the usual case > where none of the flags are set. It would have to acquire the locks > and repeat the test to make changes (and the test is already repeated > one flag at a time). I think this would be correct enough, but still > inefficient and/or even messier. The current organization is usually: > > acquire vnode interlock in caller > release vnode interlock in caller to avoid messes here (inefficient) > call > acquire mount interlock > acquire vnode interlock > test the flags; goto cleanup code if none set (usual case) > do the work > release vnode interlock > release mount interlock > return > acquire vnode interlock (if needed) > release vnode interlock (if needed) > > and it might become: > > acquire vnode interlock in caller > call > test the flags; return if none set (usual case) > release vnode interlock // check that callers are aware of > this > acquire mount interlock > acquire vnode interlock > do the work > // Assume no LOR problem for release, as below. > // Otherwise need another relese+acquire of vnode interlock. > release mount interlock > return > release vnode interlock > > > > > if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp)) > > ip->i_flag |= IN_LAZYMOD; > >_at__at_ -155,6 +156,7 _at__at_ > > ip->i_flag |= IN_MODIFIED; > > else if (ip->i_flag & IN_ACCESS) > > ip->i_flag |= IN_LAZYACCESS; > >+ MNT_IUNLOCK(vp->v_mount); > > vfs_timestamp(&ts); > > if (ip->i_flag & IN_ACCESS) { > > DIP_SET(ip, i_atime, ts.tv_sec); > > Is there no LOR problem for release? AFAIK, no. Release is guaranteed to success without blocking. > > As I understand it, MNT_ILOCK() is only protecting IN_ACCESS being > converted to IN_MODIFED, so after this conversion is done the lock > is not needed. Is this correct? The code shall set IN_MODIFIED flag only if MNTK_SUSPEND and MNTK_SUSPENDED are clean. The vfs_write_suspend() that set MNTK_SUSPEND, does VFS_SYNC() after setting the flag. Since VFS_SYNC() tries to interlock the vnode (to test the flags), and then locks the vnode for ffs_syncvnode() call, the loop in ffs_sync() cannot finish before the ufs_itimes released vnode interlock. The MNTK_SUSPENDED is set after the loop. Hmm, may be, since vnode must be interlocked by ffs_sync() after MNTK_SUSPENDED set, and before MNTK_SUSPEND, mount interlock is not needed in ufs_itimes. Tor ? > > >_at__at_ -172,10 +174,7 _at__at_ > > > > out: > > ip->i_flag &= ~(IN_ACCESS | IN_CHANGE | IN_UPDATE); > >- out_unl: > > VI_UNLOCK(vp); > >- if (mnt_locked) > >- MNT_IUNLOCK(vp->v_mount); > >} > > > >/* > > > > BTW, vfs.lookup_shared defaults to 0 and decides shared access for all > operations including read, so I wonder if there are [m]any bugs > preventing shared accesses being the default for the most important > cases where they can be (lookup and reads). (As you know, not acquiring > the vnode interlock in old versions of the above was one, and since > vfs.lookup_shared is global and not all file systems have the fix, it > still is one.) Kris Kennaway said that lookup_shared caused deadlocks ?
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:03 UTC