On 17.11.2011 00:21, Andriy Gapon wrote: > on 16/11/2011 21:27 Fabian Keil said the following: >> Kostik Belousov<kostikbel_at_gmail.com> wrote: >> >>> I was tricked into finishing the work by Andrey Gapon, who developed >>> the patch to reliably stop other processors on panic. The patch >>> greatly improves the chances of getting dump on panic on SMP host. >> >> I tested the patch trying to get a dump (from the debugger) for >> kern/162036, which currently results in the double fault reported in: >> http://lists.freebsd.org/pipermail/freebsd-current/2011-September/027766.html >> >> It didn't help, but also didn't make anything worse. >> >> Fabian > > The mi_switch recursion looks very familiar to me: > mi_switch() at mi_switch+0x270 > critical_exit() at critical_exit+0x9b > spinlock_exit() at spinlock_exit+0x17 > mi_switch() at mi_switch+0x275 > critical_exit() at critical_exit+0x9b > spinlock_exit() at spinlock_exit+0x17 > [several pages of the previous three lines skipped] > mi_switch() at mi_switch+0x275 > critical_exit() at critical_exit+0x9b > spinlock_exit() at spinlock_exit+0x17 > intr_even_schedule_thread() at intr_event_schedule_thread+0xbb > ahci_end_transaction() at ahci_end_transaction+0x398 > ahci_ch_intr() at ahci_ch_intr+0x2b5 > ahcipoll() at ahcipoll+0x15 > xpt_polled_action() at xpt_polled_action+0xf7 > > In fact I once discussed with jhb this recursion triggered from a different > place. To quote myself: > <avg> spinlock_exit -> critical_exit -> mi_switch -> kdb_switch -> > thread_unlock -> spinlock_exit -> critical_exit -> mi_switch -> ... > <avg> in the kdb context > <avg> this issue seems to be triggered by td_owepreempt being true at the time > kdb is entered > <avg> and there of course has to be an initial spinlock_exit call somewhere > <avg> in my case it's because of usb keyboard > <avg> I wonder if it would make sense to clear td_owepreempt right before > calling kdb_switch in mi_switch > <avg> instead of in sched_switch() > <avg> clearing td_owepreempt seems like a scheduler-independent operation to me > <avg> or is it better to just skip locking in usb when kdb_active is set > <avg> ? > > The workaround described above should work in this case. > Another possibility is to pessimize mtx_unlock_spin() implementations to check > SCHEDULER_STOPPED() and to bypass any further actions in that case. But that > would add unnecessary overhead to the sunny day code paths. > > Going further up the stack one can come up with the following proposals: > - check SCHEDULER_STOPPED() swi_sched() and return early > - do not call swi_sched() from xpt_done() if we somehow know that we are in a > polling mode There is no flag in CAM now to indicate polling mode, but if needed, it should not be difficult to add one and not call swi_sched(). -- Alexander MotinReceived on Wed Nov 16 2011 - 22:07:40 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:20 UTC