"Sleeping with non-sleepable lock" in NFS on recent -current

From: Peter Jeremy <peter_at_rulingia.com>
Date: Mon, 16 Sep 2019 16:12:05 +1000
I'm consistently seeing panics in the NFS code on recent -current on aarm64.
The panics are one of the following two:
Sleeping on "vmopar" with the following non-sleepable locks held:
exclusive sleep mutex NEWNFSnode lock (NEWNFSnode lock) r = 0 (0xfffffd0078b346f0) locked _at_ /usr/src/sys/fs/nfsclient/nfs_clport.c:432

Sleeping thread (tid 100077, pid 35) owns a non-sleepable lock

Both panics have nearly identical backtraces (see below).  I'm running
diskless on a Rock64 with both filesystem and swap over NFS.  The panics
can be fairly reliably triggered by any of:
* "make -j4 buildworld"
* linking the kernel (as part of buildkernel)
* "make installworld"

Has anyone else seen this?

The first panic (sleeping on vmopar) has a backtrace:
sched_switch() at mi_switch+0x19c
         pc = 0xffff0000002ab368  lr = 0xffff00000028a9f4
         sp = 0xffff000061192660  fp = 0xffff000061192680

mi_switch() at sleepq_switch+0x100
         pc = 0xffff00000028a9f4  lr = 0xffff0000002d56dc
         sp = 0xffff000061192690  fp = 0xffff0000611926d0

sleepq_switch() at sleepq_wait+0x48
         pc = 0xffff0000002d56dc  lr = 0xffff0000002d5594
         sp = 0xffff0000611926e0  fp = 0xffff000061192700

sleepq_wait() at _sleep+0x2c4  [***]
         pc = 0xffff0000002d5594  lr = 0xffff000000289eec
         sp = 0xffff000061192710  fp = 0xffff0000611927b0

_sleep() at vm_object_page_remove+0x178  [***]
         pc = 0xffff000000289eec  lr = 0xffff00000052211c
         sp = 0xffff0000611927c0  fp = 0xffff000061192820

vm_object_page_remove() at vnode_pager_setsize+0xc0
         pc = 0xffff00000052211c  lr = 0xffff000000539a70
         sp = 0xffff000061192830  fp = 0xffff000061192870

vnode_pager_setsize() at nfscl_loadattrcache+0x2e8
         pc = 0xffff000000539a70  lr = 0xffff0000001ed4b4
         sp = 0xffff000061192880  fp = 0xffff0000611928e0

nfscl_loadattrcache() at ncl_writerpc+0x104
         pc = 0xffff0000001ed4b4  lr = 0xffff0000001e2158
         sp = 0xffff0000611928f0  fp = 0xffff000061192a40

ncl_writerpc() at ncl_doio+0x36c
         pc = 0xffff0000001e2158  lr = 0xffff0000001f0370
         sp = 0xffff000061192a50  fp = 0xffff000061192ae0

ncl_doio() at nfssvc_iod+0x228
         pc = 0xffff0000001f0370  lr = 0xffff0000001f1d88
         sp = 0xffff000061192af0  fp = 0xffff000061192b50

nfssvc_iod() at fork_exit+0x7c
         pc = 0xffff0000001f1d88  lr = 0xffff00000023ff5c
         sp = 0xffff000061192b60  fp = 0xffff000061192b90

fork_exit() at fork_trampoline+0x10
         pc = 0xffff00000023ff5c  lr = 0xffff000000562c34
         sp = 0xffff000061192ba0  fp = 0x0000000000000000


For the second panic, the [***] change to:
sleepq_wait() at vm_page_sleep_if_busy+0x80
vm_page_sleep_if_busy() at vm_object_page_remove+0xfc


-- 
Peter Jeremy

Received on Mon Sep 16 2019 - 04:12:28 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:21 UTC