On Sun, Aug 30, 2015 at 01:27:29PM +0300, Andriy Gapon wrote: > > I've got the following kernel panic seemingly at random. > I have been using the kernel for about a week without any issues and I wasn't > doing anything special when the panic occurred. > Does this panic ring any bells? Could the problem be already fixed by more > recent changes? > > r286985 > > panic: Stray timeout > > (kgdb) bt > #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:260 > #1 0xffffffff8063236f in kern_reboot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:328 > #2 0xffffffff806329d4 in vpanic (fmt=<value optimized out>, ap=<value optimized > out>) at /usr/src/sys/kern/kern_shutdown.c:508 > #3 0xffffffff806326d3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:441 > #4 0xffffffff80677dea in taskqueue_timeout_func (arg=<value optimized out>) at > /usr/src/sys/kern/subr_taskqueue.c:269 > #5 0xffffffff8064858d in softclock_call_cc (c=0xfffffe000241cb50, > cc=0xffffffff81066000, direct=0) at /usr/src/sys/kern/kern_timeout.c:722 > #6 0xffffffff80648917 in softclock (arg=<value optimized out>) at > /usr/src/sys/kern/kern_timeout.c:851 > #7 0xffffffff805fe47f in intr_event_execute_handlers (p=0xfffff800059b0548, > ie=0xfffff8000599d900) at /usr/src/sys/kern/kern_intr.c:1262 > #8 0xffffffff805fefac in ithread_execute_handlers (p=0xfffff800059b0548, > ie=0xfffff8000599d900) at /usr/src/sys/kern/kern_intr.c:1275 > #9 0xffffffff805fee1b in ithread_loop (arg=<value optimized out>) at > /usr/src/sys/kern/kern_intr.c:1356 > #10 0xffffffff805fba9b in fork_exit (callout=0xffffffff805fedc0 <ithread_loop>, > arg=0xfffff800059adb60, frame=0xfffffe01dc55bc00) at > /usr/src/sys/kern/kern_fork.c:1006 > #11 0xffffffff808073de in fork_trampoline () at > /usr/src/sys/libkern/explicit_bzero.c:28 > #12 0x0000000000000000 in ?? () > > (kgdb) fr 5 > #5 0xffffffff8064858d in softclock_call_cc (c=0xfffffe000241cb50, > cc=0xffffffff81066000, direct=0) at /usr/src/sys/kern/kern_timeout.c:722 > 722 c_func(c_arg); > (kgdb) p *c > $1 = {c_links = {le = {le_next = 0x0, le_prev = 0xffffffff81066108}, sle = > {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xffffffff81066108}}, c_time > = 171699241227799, c_precision = 2684354, c_arg = 0xfffffe000241cb28, > c_func = 0xffffffff80677db0 <taskqueue_timeout_func>, c_lock = > 0xfffff80013d66d30, c_flags = 2, c_iflags = 144, c_cpu = 0} > > c_flags = CALLOUT_ACTIVE > c_iflags = CALLOUT_RETURNUNLOCKED | CALLOUT_PROCESSED > > (kgdb) p *cc > $2 = {cc_lock = {lock_object = {lo_name = 0xffffffff809a0177 "callout", lo_flags > = 720896, lo_data = 0, lo_witness = 0xfffffe0001fd1400}, mtx_lock = 4}, > cc_exec_entity = 0xffffffff81066080, cc_next = 0x0, > cc_callout = 0xfffffe0002116000, cc_callwheel = 0xfffffe0002238000, cc_expireq > = {tqh_first = 0x0, tqh_last = 0xffffffff81066108}, cc_callfree = {slh_first = > 0xfffffe00022372c0}, cc_firstevent = 171699349138844, > cc_lastscan = 171699243988142, cc_cookie = 0xfffff800059a9b00, cc_bucket = > 10456, cc_inited = 1, cc_ktr_event_name = 0xffffffff81066140 "callwheel cpu 0"} > (kgdb) p c_arg > $3 = (void *) 0xfffffe000241cb28 > > (kgdb) down > #4 0xffffffff80677dea in taskqueue_timeout_func (arg=<value optimized out>) at > /usr/src/sys/kern/subr_taskqueue.c:269 > 269 KASSERT((timeout_task->f & DT_CALLOUT_ARMED) != 0, ("Stray > timeout")); > (kgdb) p *(struct timeout_task *)0xfffffe000241cb28 > $4 = {q = 0xfffff80013d66d00, t = {ta_link = {stqe_next = 0x0}, ta_pending = 0, > ta_priority = 0, ta_func = 0xffffffff82197220 <ttm_bo_delayed_workqueue>, > ta_context = 0xfffffe000241c5c0}, c = {c_links = {le = {le_next = 0x0, > le_prev = 0xffffffff81066108}, sle = {sle_next = 0x0}, tqe = {tqe_next = > 0x0, tqe_prev = 0xffffffff81066108}}, c_time = 171699241227799, c_precision = > 2684354, c_arg = 0xfffffe000241cb28, > c_func = 0xffffffff80677db0 <taskqueue_timeout_func>, c_lock = > 0xfffff80013d66d30, c_flags = 2, c_iflags = 144, c_cpu = 0}, f = 0} > (kgdb) p *$4.q > $5 = {tq_queue = {stqh_first = 0x0, stqh_last = 0xfffff80013d66d00}, tq_enqueue > = 0xffffffff80678c30 <taskqueue_thread_enqueue>, tq_context = > 0xffffffff811130d8, tq_active = {tqh_first = 0x0, tqh_last = 0xfffff80013d66d20}, > tq_mutex = {lock_object = {lo_name = 0xffffffff809a52a1 "taskqueue", lo_flags > = 16973824, lo_data = 0, lo_witness = 0xfffffe0001fd7200}, mtx_lock = > 18446735277710583024}, tq_threads = 0xfffff80013dd1bd0, tq_tcount = 1, > tq_spin = 0, tq_flags = 5, tq_callouts = 1, tq_callbacks = 0xfffff80013d66d68, > tq_cb_contexts = 0xfffff80013d66d78} > > BTW, I see the following potential problem. taskqueue_cancel_timeout() > unconditionally resets DT_CALLOUT_ARMED, so that happens even if > callout_stop() signals that the callout is going to be called. In that > case, when taskqueue_timeout_func() gets called it's going to run into > the assertion. This is strange, I do not think that could be a right explanation of this issue. The taskqueue callout is initialized with the mutex, which means that the callout_stop() caller - must own the mutex; - is synchronous with the callout. In other words, callout cannot be running when taskqueue_cancel_timeout() calls callout_stop(), it only can be dequeued but the callout function is not yet called. If callout_stop() is performed meantime, between dropping callout_cpu lock, and locking the mutex, it must be not run.Received on Sun Aug 30 2015 - 14:16:10 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:59 UTC