Re: thread suspension when dumping core

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Sat, 4 Jun 2016 12:32:36 +0300
On Fri, Jun 03, 2016 at 07:23:47PM -0700, Mark Johnston wrote:
> Hi,
> 
> I've recently observed a hang in a multi-threaded process that had hit
> an assertion failure and was attempting to dump core. One thread was
> sleeping interruptibly on an advisory lock with TDF_SBDRY set (our
> filesystem sets VFCF_SBDRY). SIGABRT caused the receipient thread to
> suspend other threads with thread_single(SINGLE_NO_EXIT), which fails
> to interrupt the sleeping thread, resulting in the hang.
> 
> My question is, why does the SA_CORE handler not force all threads to
> the user boundary before attempting to dump core? It must do so later
> anyway in order to exit. As I understand it, TDF_SBDRY is intended to
> avoid deadlocks that can occur when stopping a process, but in this
> case we don't stop the process with the intention of resuming it, so it
> seems erroneous to apply this flag.

Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ?
This cannot work, regardless of the mode of single-threading.  TDF_SBDRY
makes such sleep non-interruptible by any single-threading request, on
the promise that the thread owns some resources (typically vnode locks).
I.e. changing the mode would not help.

I see two reasons to use SINGLE_NO_EXIT for coredumping.  It allows
coredump writer to record more exact state of the process into the notes.

Another one is that SINGLE_NO_EXIT is generally faster and more
reliable than SINGLE_BOUNDARY. Some states are already good enough for
SINGLE_NO_EXIT, while require more work to get into SINGLE_BOUNDARY. In
other words, core dump write starts earlier.

It might be not very significant reasons. 

>From what I see in the code, our NFS client has similar issue of calling
lf_advlock() with TDF_SBDRY set.  Below is the patch to fix that.
Similar bug existed in our fifofs, see r277321.

diff --git a/sys/fs/nfsclient/nfs_clvnops.c b/sys/fs/nfsclient/nfs_clvnops.c
index 2a8afa9..98625ee 100644
--- a/sys/fs/nfsclient/nfs_clvnops.c
+++ b/sys/fs/nfsclient/nfs_clvnops.c
_at__at_ -2992,7 +2992,7 _at__at_ nfs_advlock(struct vop_advlock_args *ap)
 	struct proc *p = (struct proc *)ap->a_id;
 	struct thread *td = curthread;	/* XXX */
 	struct vattr va;
-	int ret, error = EOPNOTSUPP;
+	int sbdry, ret, error = EOPNOTSUPP;
 	u_quad_t size;
 	
 	if (NFS_ISV4(vp) && (ap->a_flags & (F_POSIX | F_FLOCK)) != 0) {
_at__at_ -3087,7 +3087,10 _at__at_ nfs_advlock(struct vop_advlock_args *ap)
 		if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NOLOCKD) != 0) {
 			size = VTONFS(vp)->n_size;
 			NFSVOPUNLOCK(vp, 0);
+			sbdry = sigallowstop();
 			error = lf_advlock(ap, &(vp->v_lockf), size);
+			if (sbdry)
+				sigdeferstop();
 		} else {
 			if (nfs_advlock_p != NULL)
 				error = nfs_advlock_p(ap);
_at__at_ -3114,7 +3117,7 _at__at_ nfs_advlockasync(struct vop_advlockasync_args *ap)
 {
 	struct vnode *vp = ap->a_vp;
 	u_quad_t size;
-	int error;
+	int error, sbdry;
 	
 	if (NFS_ISV4(vp))
 		return (EOPNOTSUPP);
_at__at_ -3124,7 +3127,10 _at__at_ nfs_advlockasync(struct vop_advlockasync_args *ap)
 	if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NOLOCKD) != 0) {
 		size = VTONFS(vp)->n_size;
 		NFSVOPUNLOCK(vp, 0);
+		sbdry = sigallowstop();
 		error = lf_advlockasync(ap, &(vp->v_lockf), size);
+		if (sbdry)
+			sigdeferstop();
 	} else {
 		NFSVOPUNLOCK(vp, 0);
 		error = EOPNOTSUPP;
Received on Sat Jun 04 2016 - 07:32:41 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:05 UTC