On Thu, Nov 17, 2011 at 01:07:38AM +0200, Alexander Motin wrote: > On 17.11.2011 00:21, Andriy Gapon wrote: > >on 16/11/2011 21:27 Fabian Keil said the following: > >>Kostik Belousov<kostikbel_at_gmail.com> wrote: > >> > >>>I was tricked into finishing the work by Andrey Gapon, who developed > >>>the patch to reliably stop other processors on panic. The patch > >>>greatly improves the chances of getting dump on panic on SMP host. > >> > >>I tested the patch trying to get a dump (from the debugger) for > >>kern/162036, which currently results in the double fault reported in: > >>http://lists.freebsd.org/pipermail/freebsd-current/2011-September/027766.html > >> > >>It didn't help, but also didn't make anything worse. > >> > >>Fabian > > > >The mi_switch recursion looks very familiar to me: > >mi_switch() at mi_switch+0x270 > >critical_exit() at critical_exit+0x9b > >spinlock_exit() at spinlock_exit+0x17 > >mi_switch() at mi_switch+0x275 > >critical_exit() at critical_exit+0x9b > >spinlock_exit() at spinlock_exit+0x17 > >[several pages of the previous three lines skipped] > >mi_switch() at mi_switch+0x275 > >critical_exit() at critical_exit+0x9b > >spinlock_exit() at spinlock_exit+0x17 > >intr_even_schedule_thread() at intr_event_schedule_thread+0xbb > >ahci_end_transaction() at ahci_end_transaction+0x398 > >ahci_ch_intr() at ahci_ch_intr+0x2b5 > >ahcipoll() at ahcipoll+0x15 > >xpt_polled_action() at xpt_polled_action+0xf7 > > > >In fact I once discussed with jhb this recursion triggered from a different > >place. To quote myself: > ><avg> spinlock_exit -> critical_exit -> mi_switch -> kdb_switch -> > >thread_unlock -> spinlock_exit -> critical_exit -> mi_switch -> ... > ><avg> in the kdb context > ><avg> this issue seems to be triggered by td_owepreempt being true at > >the time > >kdb is entered > ><avg> and there of course has to be an initial spinlock_exit call > >somewhere > ><avg> in my case it's because of usb keyboard > ><avg> I wonder if it would make sense to clear td_owepreempt right > >before > >calling kdb_switch in mi_switch > ><avg> instead of in sched_switch() > ><avg> clearing td_owepreempt seems like a scheduler-independent > >operation to me > ><avg> or is it better to just skip locking in usb when kdb_active is set > ><avg> ? > > > >The workaround described above should work in this case. > >Another possibility is to pessimize mtx_unlock_spin() implementations to > >check > >SCHEDULER_STOPPED() and to bypass any further actions in that case. But > >that > >would add unnecessary overhead to the sunny day code paths. > > > >Going further up the stack one can come up with the following proposals: > >- check SCHEDULER_STOPPED() swi_sched() and return early > >- do not call swi_sched() from xpt_done() if we somehow know that we are > >in a > >polling mode > > There is no flag in CAM now to indicate polling mode, but if needed, it > should not be difficult to add one and not call swi_sched(). I have the following change for eons on my test boxes. Without it, I simply cannot get _any_ dump. diff --git a/sys/cam/cam_xpt.c b/sys/cam/cam_xpt.c index 10b89c7..a38e42f 100644 --- a/sys/cam/cam_xpt.c +++ b/sys/cam/cam_xpt.c _at__at_ -4230,7 +4230,7 _at__at_ xpt_done(union ccb *done_ccb) TAILQ_INSERT_TAIL(&cam_simq, sim, links); mtx_unlock(&cam_simq_lock); sim->flags |= CAM_SIM_ON_DONEQ; - if (first) + if (first && panicstr == NULL) swi_sched(cambio_ih, 0); } }
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:20 UTC