on 16/11/2011 21:27 Fabian Keil said the following: > Kostik Belousov <kostikbel_at_gmail.com> wrote: > >> I was tricked into finishing the work by Andrey Gapon, who developed >> the patch to reliably stop other processors on panic. The patch >> greatly improves the chances of getting dump on panic on SMP host. > > I tested the patch trying to get a dump (from the debugger) for > kern/162036, which currently results in the double fault reported in: > http://lists.freebsd.org/pipermail/freebsd-current/2011-September/027766.html > > It didn't help, but also didn't make anything worse. > > Fabian The mi_switch recursion looks very familiar to me: mi_switch() at mi_switch+0x270 critical_exit() at critical_exit+0x9b spinlock_exit() at spinlock_exit+0x17 mi_switch() at mi_switch+0x275 critical_exit() at critical_exit+0x9b spinlock_exit() at spinlock_exit+0x17 [several pages of the previous three lines skipped] mi_switch() at mi_switch+0x275 critical_exit() at critical_exit+0x9b spinlock_exit() at spinlock_exit+0x17 intr_even_schedule_thread() at intr_event_schedule_thread+0xbb ahci_end_transaction() at ahci_end_transaction+0x398 ahci_ch_intr() at ahci_ch_intr+0x2b5 ahcipoll() at ahcipoll+0x15 xpt_polled_action() at xpt_polled_action+0xf7 In fact I once discussed with jhb this recursion triggered from a different place. To quote myself: <avg> spinlock_exit -> critical_exit -> mi_switch -> kdb_switch -> thread_unlock -> spinlock_exit -> critical_exit -> mi_switch -> ... <avg> in the kdb context <avg> this issue seems to be triggered by td_owepreempt being true at the time kdb is entered <avg> and there of course has to be an initial spinlock_exit call somewhere <avg> in my case it's because of usb keyboard <avg> I wonder if it would make sense to clear td_owepreempt right before calling kdb_switch in mi_switch <avg> instead of in sched_switch() <avg> clearing td_owepreempt seems like a scheduler-independent operation to me <avg> or is it better to just skip locking in usb when kdb_active is set <avg> ? The workaround described above should work in this case. Another possibility is to pessimize mtx_unlock_spin() implementations to check SCHEDULER_STOPPED() and to bypass any further actions in that case. But that would add unnecessary overhead to the sunny day code paths. Going further up the stack one can come up with the following proposals: - check SCHEDULER_STOPPED() swi_sched() and return early - do not call swi_sched() from xpt_done() if we somehow know that we are in a polling mode -- Andriy GaponReceived on Wed Nov 16 2011 - 21:21:31 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:20 UTC