On Sun, Dec 29, 2013 at 04:36:18PM -0500, Mark Johnston wrote: > Hello, > > While experimenting with some userland DTrace scripts, I seem to > be consistently able to trigger a deadlock between smp_rendezvous_cpus() > (called periodically by DTrace) and smp_targeted_tlb_shootdown(): > > spin lock 0xffffffff80fe0620 (smp rendezvous) held by 0xfffff8000753b490 (tid 100059) too long > panic: spin lock held too long > [...] > (gdb) bt > #0 doadump (textdump=1) at pcpu.h:219 > #1 0xffffffff806387c7 in kern_reboot (howto=260) at /usr/home/markj/src/freebsd/sys/kern/kern_shutdown.c:452 > #2 0xffffffff80638cd5 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/home/markj/src/freebsd/sys/kern/kern_shutdown.c:759 > #3 0xffffffff80638d23 in panic (fmt=<value optimized out>) at /usr/home/markj/src/freebsd/sys/kern/kern_shutdown.c:688 > #4 0xffffffff80624b68 in _mtx_lock_spin_cookie (c=<value optimized out>, tid=<value optimized out>, opts=<value optimized out>, file=<value optimized out>, line=<value optimized out>) > at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:551 > #5 0xffffffff80624878 in __mtx_lock_spin_flags (c=<value optimized out>, opts=0, file=0xffffffff80a1ca28 "/usr/home/markj/src/freebsd/sys/kern/subr_smp.c", line=498) at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:279 > #6 0xffffffff8067eba3 in smp_rendezvous_cpus (setup_func=0xffffffff8067eae0 <smp_no_rendevous_barrier>, action_func=0xffffffff814e2d00 <dtrace_sync_func>, teardown_func=0xffffffff8067eae0 <smp_no_rendevous_barrier>, > arg=0x0) at /usr/home/markj/src/freebsd/sys/kern/subr_smp.c:498 > #7 0xffffffff814d5743 in dtrace_state_deadman (arg=0xfffff80007ee5c00) at /usr/home/markj/src/freebsd/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:13144 > #8 0xffffffff8064cf38 in softclock_call_cc (c=0xfffff80007ee5d40, cc=0xffffffff80fda080, direct=0) at /usr/home/markj/src/freebsd/sys/kern/kern_timeout.c:681 > #9 0xffffffff8064d2b7 in softclock (arg=<value optimized out>) at /usr/home/markj/src/freebsd/sys/kern/kern_timeout.c:809 > #10 0xffffffff8060a053 in intr_event_execute_handlers (p=<value optimized out>, ie=0xfffff80002958d00) at /usr/home/markj/src/freebsd/sys/kern/kern_intr.c:1263 > #11 0xffffffff8060aa26 in ithread_loop (arg=0xfffff80002999f60) at /usr/home/markj/src/freebsd/sys/kern/kern_intr.c:1276 > #12 0xffffffff806071a4 in fork_exit (callout=0xffffffff8060a980 <ithread_loop>, arg=0xfffff80002999f60, frame=0xfffffe0113b99ac0) at /usr/home/markj/src/freebsd/sys/kern/kern_fork.c:977 > #13 0xffffffff808d7fce in fork_trampoline () at /usr/home/markj/src/freebsd/sys/amd64/amd64/exception.S:605 > > (kgdb) tid 100059 > [Switching to thread 67 (Thread 100059)]#0 0xffffffff808e1f08 in cpustop_handler () at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1432 > 1432 savectx(&stoppcbs[cpu]); > (kgdb) bt > #0 0xffffffff808e1f08 in cpustop_handler () at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1432 > #1 0xffffffff808e1ecf in ipi_nmi_handler () at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1417 > #2 0xffffffff808f1e02 in trap (frame=0xfffffe0113b68f30) at /usr/home/markj/src/freebsd/sys/amd64/amd64/trap.c:208 > #3 0xffffffff808d7ed3 in nmi_calltrap () at /usr/home/markj/src/freebsd/sys/amd64/amd64/exception.S:504 > #4 0xffffffff808e1b39 in smp_targeted_tlb_shootdown (mask={__bits = {0}}, vector=<value optimized out>, pmap=<value optimized out>, addr1=<value optimized out>, addr2=<value optimized out>) > at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1204 > #5 0xffffffff808e2f25 in pmap_invalidate_page (pmap=<value optimized out>, va=<value optimized out>) at /usr/home/markj/src/freebsd/sys/amd64/amd64/pmap.c:1375 > #6 0xffffffff808ec3d5 in pmap_ts_referenced (m=0xfffff800bcfc78b8) at /usr/home/markj/src/freebsd/sys/amd64/amd64/pmap.c:5743 > #7 0xffffffff808c8953 in vm_pageout () at /usr/home/markj/src/freebsd/sys/vm/vm_pageout.c:1366 > #8 0xffffffff806071a4 in fork_exit (callout=0xffffffff808c7930 <vm_pageout>, arg=0x0, frame=0xfffffe011bfabac0) at /usr/home/markj/src/freebsd/sys/kern/kern_fork.c:977 > #9 0xffffffff808d7fce in fork_trampoline () at /usr/home/markj/src/freebsd/sys/amd64/amd64/exception.S:605 > > Indeed, there is a comment above the definition of smp_ipi_mtx in > subr_smp.c to the effect that a deadlock can occur if, say, the target > CPU of smp_targeted_tlb_shootdown() is spinning on smp_ipi_mtx. Is there > any reason that this deadlock doesn't happen more often in practice? Is > it possible to spin on smp_ipi_mtx without disabling interrupts, as that > doesn't seem to be necessary in this case? IMO, what wrong there is that smp_rendezvous_cpus() called from the wrong context. As you noted yourself, the interrupts are disabled in the caller, and doing this operation in the interrupt context is not correct. Note that smp_tlb_shootdown() and smp_targeted_tlb_shootdown() both assert that interrupts are enabled. IMO similar assert would be useful for mtx_lock_spin(&smp_ipi_mtx), but adding it is somewhat in non-ugly way seems to be not trivial. Might be, a flag for mtx_init() that forces the check for given mutex, but again, there is no MI primitive to assert that local interrupts are enabled on the CPU.
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:45 UTC