Re: "Sleeping with non-sleepable lock" in NFS on recent -current

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Mon, 16 Sep 2019 09:32:52 +0300
On Mon, Sep 16, 2019 at 04:12:05PM +1000, Peter Jeremy wrote:
> I'm consistently seeing panics in the NFS code on recent -current on aarm64.
> The panics are one of the following two:
> Sleeping on "vmopar" with the following non-sleepable locks held:
> exclusive sleep mutex NEWNFSnode lock (NEWNFSnode lock) r = 0 (0xfffffd0078b346f0) locked _at_ /usr/src/sys/fs/nfsclient/nfs_clport.c:432
> 
> Sleeping thread (tid 100077, pid 35) owns a non-sleepable lock
> 
> Both panics have nearly identical backtraces (see below).  I'm running
> diskless on a Rock64 with both filesystem and swap over NFS.  The panics
> can be fairly reliably triggered by any of:
> * "make -j4 buildworld"
> * linking the kernel (as part of buildkernel)
> * "make installworld"
> 
> Has anyone else seen this?
> 
> The first panic (sleeping on vmopar) has a backtrace:
> sched_switch() at mi_switch+0x19c
>          pc = 0xffff0000002ab368  lr = 0xffff00000028a9f4
>          sp = 0xffff000061192660  fp = 0xffff000061192680
> 
> mi_switch() at sleepq_switch+0x100
>          pc = 0xffff00000028a9f4  lr = 0xffff0000002d56dc
>          sp = 0xffff000061192690  fp = 0xffff0000611926d0
> 
> sleepq_switch() at sleepq_wait+0x48
>          pc = 0xffff0000002d56dc  lr = 0xffff0000002d5594
>          sp = 0xffff0000611926e0  fp = 0xffff000061192700
> 
> sleepq_wait() at _sleep+0x2c4  [***]
>          pc = 0xffff0000002d5594  lr = 0xffff000000289eec
>          sp = 0xffff000061192710  fp = 0xffff0000611927b0
> 
> _sleep() at vm_object_page_remove+0x178  [***]
>          pc = 0xffff000000289eec  lr = 0xffff00000052211c
>          sp = 0xffff0000611927c0  fp = 0xffff000061192820
> 
> vm_object_page_remove() at vnode_pager_setsize+0xc0
>          pc = 0xffff00000052211c  lr = 0xffff000000539a70
>          sp = 0xffff000061192830  fp = 0xffff000061192870
> 
> vnode_pager_setsize() at nfscl_loadattrcache+0x2e8
>          pc = 0xffff000000539a70  lr = 0xffff0000001ed4b4
>          sp = 0xffff000061192880  fp = 0xffff0000611928e0
> 
> nfscl_loadattrcache() at ncl_writerpc+0x104
>          pc = 0xffff0000001ed4b4  lr = 0xffff0000001e2158
>          sp = 0xffff0000611928f0  fp = 0xffff000061192a40
> 
> ncl_writerpc() at ncl_doio+0x36c
>          pc = 0xffff0000001e2158  lr = 0xffff0000001f0370
>          sp = 0xffff000061192a50  fp = 0xffff000061192ae0
> 
> ncl_doio() at nfssvc_iod+0x228
>          pc = 0xffff0000001f0370  lr = 0xffff0000001f1d88
>          sp = 0xffff000061192af0  fp = 0xffff000061192b50
> 
> nfssvc_iod() at fork_exit+0x7c
>          pc = 0xffff0000001f1d88  lr = 0xffff00000023ff5c
>          sp = 0xffff000061192b60  fp = 0xffff000061192b90
> 
> fork_exit() at fork_trampoline+0x10
>          pc = 0xffff00000023ff5c  lr = 0xffff000000562c34
>          sp = 0xffff000061192ba0  fp = 0x0000000000000000
> 
> 
> For the second panic, the [***] change to:
> sleepq_wait() at vm_page_sleep_if_busy+0x80
> vm_page_sleep_if_busy() at vm_object_page_remove+0xfc

Weird since this should have been fixed long time ago.  Anyway, please
try the following, it should fix the rest of cases.

diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c
index 471e029a8b5..098de1ced80 100644
--- a/sys/fs/nfsclient/nfs_clport.c
+++ b/sys/fs/nfsclient/nfs_clport.c
_at__at_ -511,10 +511,10 _at__at_ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper,
 				 * zero np->n_attrstamp to indicate that
 				 * the attributes are stale.
 				 */
-				vap->va_size = np->n_size;
+				nsize = vap->va_size = np->n_size;
+				setnsize = 1;
 				np->n_attrstamp = 0;
 				KDTRACE_NFS_ATTRCACHE_FLUSH_DONE(vp);
-				vnode_pager_setsize(vp, np->n_size);
 			} else if (np->n_flag & NMODIFIED) {
 				/*
 				 * We've modified the file: Use the larger
_at__at_ -526,7 +526,8 _at__at_ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper,
 					np->n_size = vap->va_size;
 					np->n_flag |= NSIZECHANGED;
 				}
-				vnode_pager_setsize(vp, np->n_size);
+				nsize = np->n_size;
+				setnsize = 1;
 			} else if (vap->va_size < np->n_size) {
 				/*
 				 * When shrinking the size, the call to
_at__at_ -540,7 +541,7 _at__at_ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper,
 			} else {
 				np->n_size = vap->va_size;
 				np->n_flag |= NSIZECHANGED;
-				vnode_pager_setsize(vp, np->n_size);
+				setnsize = 1;
 			}
 		} else {
 			np->n_size = vap->va_size;
Received on Mon Sep 16 2019 - 04:33:01 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:21 UTC