On Nov 13, 2011, at 3:32 AM, Kostik Belousov wrote: > I was tricked into finishing the work by Andrey Gapon, who developed > the patch to reliably stop other processors on panic. The patch > greatly improves the chances of getting dump on panic on SMP host. > Several people already saw the patchset, and I remember that Andrey > posted it to some lists. > > The change stops other (*) processors early upon the panic. This way, > no parallel manipulation of the kernel memory is performed by CPUs. > In particular, the kernel memory map is static. Patch prevents the > panic thread from blocking and switching out. > > * - in the context of the description, other means not current. > > Since other threads are not run anymore, lock owner cannot release a > lock which is required by panic thread. Due to this, we need to fake > a lock acquisition after the panic, which adds minimal overhead to the > locking cost. The patch tries to not add any overhead on the fast path > of the lock acquire. The check for the after-panic condition was > reduced to single memory access, done only when the quick cas lock > attempt failed, and braced with __unlikely compiler hint. > > For now, the new mode of operation is disabled by default, since some > further USB changes are needed to make USB keyboard usable in that > environment. > > With the patch, getting a dump from the machine without debugger > compiled in is much more realistic. Please comment, I will commit the > change in 2 weeks unless strong reasons not to are given. > > http://people.freebsd.org/~kib/misc/stop_cpus_on_panic.1.patch > We have many systems running Andriy's latest version of the patch under 8.2. I also brought in the related USB patch; without it, the system hangs up while dumping almost every time. With both patches in place* it has worked flawlessly for us. -Andrew * - with one change: always do the critical_enter() / critical_exit(). Using spinlock_enter() blocks the software watchdog, which needs to still be active in case the dump hangs for other reasons. This is obviously not ideal but the best solution I have right now. We also stop all of the network interfaces at the beginning of boot(). -------------------------------------------------- Andrew Boyer aboyer_at_averesystems.comReceived on Thu Nov 17 2011 - 16:28:24 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:20 UTC