Re: panic: sleeping thread on r352386

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Tue, 17 Sep 2019 11:06:58 +0300
On Tue, Sep 17, 2019 at 02:42:51PM +0900, Masachika ISHIZUKA wrote:
> >>   This panic happens on 1300047 (both r352239 and r352386) with core
> >> i5-7500 as follows. This panic dose not happen on r351728 (1300044).
> >> (The following lines were typed by hand so they might have some miss
> >> typed letters.)
> >> 
> >> ==
> >> Sleeping thread (tid 100177, pid 1814) owns a non-sleepable lock
> >> KDB: stack backtrace of thread 100177:
> > 
> > 
> > https://svnweb.freebsd.org/base?view=revision&revision=352393
> 
>   Thank you for reply.
> 
>   I updated to r352431 and this does not panic. Thank you very much.
>   But 'make buildworld' fails by segment fault like below.
> (buildworld is running over the nfs file system.)
> 
> --- modules-all ---
> --- ath_hal_ar5211.ko.debug ---
> objcopy --only-keep-debug ath_hal_ar5211.ko.full ath_hal_ar5211.ko.debug
> Segmentation fault (core dumped)
> *** [ath_hal_ar5211.ko.debug] Error code 139
> make[4]: stopped in /usr/altlocal/freebsd-current/src/sys/modules/ath_hal_ar52111 error
> 
>   The position of segment fault is diffrent each time.
>   The below is output of another 'make buildworld'.
> 
> --- kernel.full ---
> Segmentation fault (core dumped)
> *** [kernel.full] Error code 139
> make[2]: stopped in /usr/altlocal/freebsd-current/obj/usr/altlocal/freebsd-current/src/amd64.amd64/sys/GENERIC
> 
>   /var/log/messages is shown as bellow.
> 
> Sep 17 11:22:56 okra kernel: Failed to fully fault in a core file segment at VA
> 0x800a00000 with size 0x163000 to be written at offset 0x84a000 for process nm
> Sep 17 11:22:56 okra kernel: pid 53593 (nm), jid 0, uid 16220: exited on signal
> 11 (core dumped)
> Sep 17 11:22:57 okra kernel: Failed to fully fault in a core file segment at VA
> 0x800a00000 with size 0x163000 to be written at offset 0x88b000 for process objcopy
> Sep 17 11:22:57 okra kernel: pid 53603 (objcopy), jid 0, uid 16220: exited on signal 11 (core dumped)
> 
>   Retry 'make buildworld'
> 
> Sep 17 12:24:05 okra kernel: Failed to fully fault in a core file segment at VA
> 0x8002f6000 with size 0x93000 to be written at offset 0x239000 for process nm
> Sep 17 12:24:05 okra kernel: pid 96873 (nm), jid 0, uid 16220: exited on signal
> 11 (core dumped)
> Sep 17 12:24:05 okra kernel: Failed to fully fault in a core file segment at VA
> 0x80035f000 with size 0x93000 to be written at offset 0x281000 for process objcopy
> Sep 17 12:24:06 okra kernel: pid 96889 (objcopy), jid 0, uid 16220: exited on signal 11 (core dumped)
> 
>   Retry 'make buildworld'
> 
> Sep 17 14:01:39 okra kernel: Failed to fully fault in a core file segment at VA
> 0x8048da000 with size 0x112000 to be written at offset 0x1a33000 for process ld.lld
> Sep 17 14:01:51 okra kernel: Failed to fully fault in a core file segment at VA
> 0x8117cc000 with size 0x1e7000 to be written at offset 0xe925000 for process ld.lld
> Sep 17 14:01:53 okra kernel: pid 50292 (ld.lld), jid 0, uid 16220: exited on signal 11 (core dumped)
> 
>   I can 'make buildworld' successfully on r351728(1300044).

Try the following change, which more accurately tries to avoid
vnode_pager_setsize().  The real cause requires much more extensive
changes.

diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c
index 63ea4736707..16dc7745c77 100644
--- a/sys/fs/nfsclient/nfs_clport.c
+++ b/sys/fs/nfsclient/nfs_clport.c
_at__at_ -414,12 +414,11 _at__at_ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper,
 	struct nfsnode *np;
 	struct nfsmount *nmp;
 	struct timespec mtime_save;
-	u_quad_t nsize;
-	int setnsize, error, force_fid_err;
+	u_quad_t nsize, osize;
+	int error, force_fid_err;
+	bool setnsize;
 
 	error = 0;
-	setnsize = 0;
-	nsize = 0;
 
 	/*
 	 * If v_type == VNON it is a new node, so fill in the v_type,
_at__at_ -439,6 +438,7 _at__at_ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper,
 	nmp = VFSTONFS(vp->v_mount);
 	vap = &np->n_vattr.na_vattr;
 	mtime_save = vap->va_mtime;
+	osize = vap->va_size;
 	if (writeattr) {
 		np->n_vattr.na_filerev = nap->na_filerev;
 		np->n_vattr.na_size = nap->na_size;
_at__at_ -511,8 +511,7 _at__at_ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper,
 				 * zero np->n_attrstamp to indicate that
 				 * the attributes are stale.
 				 */
-				nsize = vap->va_size = np->n_size;
-				setnsize = 1;
+				vap->va_size = np->n_size;
 				np->n_attrstamp = 0;
 				KDTRACE_NFS_ATTRCACHE_FLUSH_DONE(vp);
 			} else if (np->n_flag & NMODIFIED) {
_at__at_ -526,22 +525,9 _at__at_ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper,
 					np->n_size = vap->va_size;
 					np->n_flag |= NSIZECHANGED;
 				}
-				nsize = np->n_size;
-				setnsize = 1;
-			} else if (vap->va_size < np->n_size) {
-				/*
-				 * When shrinking the size, the call to
-				 * vnode_pager_setsize() cannot be done
-				 * with the mutex held, so delay it until
-				 * after the mtx_unlock call.
-				 */
-				nsize = np->n_size = vap->va_size;
-				np->n_flag |= NSIZECHANGED;
-				setnsize = 1;
 			} else {
-				nsize = np->n_size = vap->va_size;
+				np->n_size = vap->va_size;
 				np->n_flag |= NSIZECHANGED;
-				setnsize = 1;
 			}
 		} else {
 			np->n_size = vap->va_size;
_at__at_ -579,6 +565,21 _at__at_ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper,
 	if (np->n_attrstamp != 0)
 		KDTRACE_NFS_ATTRCACHE_LOAD_DONE(vp, vap, error);
 #endif
+	nsize = vap->va_size;
+	if (nsize == osize) {
+		setnsize = false;
+	} else if (nsize > osize) {
+		vnode_pager_setsize(vp, nsize);
+		setnsize = false;
+	} else {
+		/*
+		 * When shrinking the size, the call to
+		 * vnode_pager_setsize() cannot be done with the mutex
+		 * held, because we might need to wait for a busy
+		 * page.  Delay it until after the node is unlocked.
+		 */
+		setnsize = true;
+	}
 	NFSUNLOCKNODE(np);
 	if (setnsize)
 		vnode_pager_setsize(vp, nsize);
Received on Tue Sep 17 2019 - 06:07:19 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:21 UTC