On Fri, Jun 03, 2016 at 07:23:47PM -0700, Mark Johnston wrote: > Hi, > > I've recently observed a hang in a multi-threaded process that had hit > an assertion failure and was attempting to dump core. One thread was > sleeping interruptibly on an advisory lock with TDF_SBDRY set (our > filesystem sets VFCF_SBDRY). SIGABRT caused the receipient thread to > suspend other threads with thread_single(SINGLE_NO_EXIT), which fails > to interrupt the sleeping thread, resulting in the hang. > > My question is, why does the SA_CORE handler not force all threads to > the user boundary before attempting to dump core? It must do so later > anyway in order to exit. As I understand it, TDF_SBDRY is intended to > avoid deadlocks that can occur when stopping a process, but in this > case we don't stop the process with the intention of resuming it, so it > seems erroneous to apply this flag. Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ? This cannot work, regardless of the mode of single-threading. TDF_SBDRY makes such sleep non-interruptible by any single-threading request, on the promise that the thread owns some resources (typically vnode locks). I.e. changing the mode would not help. I see two reasons to use SINGLE_NO_EXIT for coredumping. It allows coredump writer to record more exact state of the process into the notes. Another one is that SINGLE_NO_EXIT is generally faster and more reliable than SINGLE_BOUNDARY. Some states are already good enough for SINGLE_NO_EXIT, while require more work to get into SINGLE_BOUNDARY. In other words, core dump write starts earlier. It might be not very significant reasons. >From what I see in the code, our NFS client has similar issue of calling lf_advlock() with TDF_SBDRY set. Below is the patch to fix that. Similar bug existed in our fifofs, see r277321. diff --git a/sys/fs/nfsclient/nfs_clvnops.c b/sys/fs/nfsclient/nfs_clvnops.c index 2a8afa9..98625ee 100644 --- a/sys/fs/nfsclient/nfs_clvnops.c +++ b/sys/fs/nfsclient/nfs_clvnops.c _at__at_ -2992,7 +2992,7 _at__at_ nfs_advlock(struct vop_advlock_args *ap) struct proc *p = (struct proc *)ap->a_id; struct thread *td = curthread; /* XXX */ struct vattr va; - int ret, error = EOPNOTSUPP; + int sbdry, ret, error = EOPNOTSUPP; u_quad_t size; if (NFS_ISV4(vp) && (ap->a_flags & (F_POSIX | F_FLOCK)) != 0) { _at__at_ -3087,7 +3087,10 _at__at_ nfs_advlock(struct vop_advlock_args *ap) if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NOLOCKD) != 0) { size = VTONFS(vp)->n_size; NFSVOPUNLOCK(vp, 0); + sbdry = sigallowstop(); error = lf_advlock(ap, &(vp->v_lockf), size); + if (sbdry) + sigdeferstop(); } else { if (nfs_advlock_p != NULL) error = nfs_advlock_p(ap); _at__at_ -3114,7 +3117,7 _at__at_ nfs_advlockasync(struct vop_advlockasync_args *ap) { struct vnode *vp = ap->a_vp; u_quad_t size; - int error; + int error, sbdry; if (NFS_ISV4(vp)) return (EOPNOTSUPP); _at__at_ -3124,7 +3127,10 _at__at_ nfs_advlockasync(struct vop_advlockasync_args *ap) if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NOLOCKD) != 0) { size = VTONFS(vp)->n_size; NFSVOPUNLOCK(vp, 0); + sbdry = sigallowstop(); error = lf_advlockasync(ap, &(vp->v_lockf), size); + if (sbdry) + sigdeferstop(); } else { NFSVOPUNLOCK(vp, 0); error = EOPNOTSUPP;Received on Sat Jun 04 2016 - 07:32:41 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:05 UTC