smp_rendezvous_cpus() deadlock

From: Mark Johnston <markj_at_freebsd.org>
Date: Sun, 29 Dec 2013 16:36:18 -0500
Hello,

While experimenting with some userland DTrace scripts, I seem to
be consistently able to trigger a deadlock between smp_rendezvous_cpus()
(called periodically by DTrace) and smp_targeted_tlb_shootdown():

spin lock 0xffffffff80fe0620 (smp rendezvous) held by 0xfffff8000753b490 (tid 100059) too long
panic: spin lock held too long
[...]
(gdb) bt
#0  doadump (textdump=1) at pcpu.h:219
#1  0xffffffff806387c7 in kern_reboot (howto=260) at /usr/home/markj/src/freebsd/sys/kern/kern_shutdown.c:452
#2  0xffffffff80638cd5 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/home/markj/src/freebsd/sys/kern/kern_shutdown.c:759
#3  0xffffffff80638d23 in panic (fmt=<value optimized out>) at /usr/home/markj/src/freebsd/sys/kern/kern_shutdown.c:688
#4  0xffffffff80624b68 in _mtx_lock_spin_cookie (c=<value optimized out>, tid=<value optimized out>, opts=<value optimized out>, file=<value optimized out>, line=<value optimized out>)
    at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:551
#5  0xffffffff80624878 in __mtx_lock_spin_flags (c=<value optimized out>, opts=0, file=0xffffffff80a1ca28 "/usr/home/markj/src/freebsd/sys/kern/subr_smp.c", line=498) at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:279
#6  0xffffffff8067eba3 in smp_rendezvous_cpus (setup_func=0xffffffff8067eae0 <smp_no_rendevous_barrier>, action_func=0xffffffff814e2d00 <dtrace_sync_func>, teardown_func=0xffffffff8067eae0 <smp_no_rendevous_barrier>, 
        arg=0x0) at /usr/home/markj/src/freebsd/sys/kern/subr_smp.c:498
#7  0xffffffff814d5743 in dtrace_state_deadman (arg=0xfffff80007ee5c00) at /usr/home/markj/src/freebsd/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:13144
#8  0xffffffff8064cf38 in softclock_call_cc (c=0xfffff80007ee5d40, cc=0xffffffff80fda080, direct=0) at /usr/home/markj/src/freebsd/sys/kern/kern_timeout.c:681
#9  0xffffffff8064d2b7 in softclock (arg=<value optimized out>) at /usr/home/markj/src/freebsd/sys/kern/kern_timeout.c:809
#10 0xffffffff8060a053 in intr_event_execute_handlers (p=<value optimized out>, ie=0xfffff80002958d00) at /usr/home/markj/src/freebsd/sys/kern/kern_intr.c:1263
#11 0xffffffff8060aa26 in ithread_loop (arg=0xfffff80002999f60) at /usr/home/markj/src/freebsd/sys/kern/kern_intr.c:1276
#12 0xffffffff806071a4 in fork_exit (callout=0xffffffff8060a980 <ithread_loop>, arg=0xfffff80002999f60, frame=0xfffffe0113b99ac0) at /usr/home/markj/src/freebsd/sys/kern/kern_fork.c:977
#13 0xffffffff808d7fce in fork_trampoline () at /usr/home/markj/src/freebsd/sys/amd64/amd64/exception.S:605

(kgdb) tid 100059
[Switching to thread 67 (Thread 100059)]#0  0xffffffff808e1f08 in cpustop_handler () at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1432
1432            savectx(&stoppcbs[cpu]);
(kgdb) bt
#0  0xffffffff808e1f08 in cpustop_handler () at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1432
#1  0xffffffff808e1ecf in ipi_nmi_handler () at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1417
#2  0xffffffff808f1e02 in trap (frame=0xfffffe0113b68f30) at /usr/home/markj/src/freebsd/sys/amd64/amd64/trap.c:208
#3  0xffffffff808d7ed3 in nmi_calltrap () at /usr/home/markj/src/freebsd/sys/amd64/amd64/exception.S:504
#4  0xffffffff808e1b39 in smp_targeted_tlb_shootdown (mask={__bits = {0}}, vector=<value optimized out>, pmap=<value optimized out>, addr1=<value optimized out>, addr2=<value optimized out>)
    at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1204
#5  0xffffffff808e2f25 in pmap_invalidate_page (pmap=<value optimized out>, va=<value optimized out>) at /usr/home/markj/src/freebsd/sys/amd64/amd64/pmap.c:1375
#6  0xffffffff808ec3d5 in pmap_ts_referenced (m=0xfffff800bcfc78b8) at /usr/home/markj/src/freebsd/sys/amd64/amd64/pmap.c:5743
#7  0xffffffff808c8953 in vm_pageout () at /usr/home/markj/src/freebsd/sys/vm/vm_pageout.c:1366
#8  0xffffffff806071a4 in fork_exit (callout=0xffffffff808c7930 <vm_pageout>, arg=0x0, frame=0xfffffe011bfabac0) at /usr/home/markj/src/freebsd/sys/kern/kern_fork.c:977
#9  0xffffffff808d7fce in fork_trampoline () at /usr/home/markj/src/freebsd/sys/amd64/amd64/exception.S:605

Indeed, there is a comment above the definition of smp_ipi_mtx in
subr_smp.c to the effect that a deadlock can occur if, say, the target
CPU of smp_targeted_tlb_shootdown() is spinning on smp_ipi_mtx. Is there
any reason that this deadlock doesn't happen more often in practice? Is
it possible to spin on smp_ipi_mtx without disabling interrupts, as that
doesn't seem to be necessary in this case?

Thanks,
-Mark
Received on Sun Dec 29 2013 - 20:36:56 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:45 UTC