2012/5/17, Andriy Gapon <avg_at_freebsd.org>: > on 25/01/2012 23:52 Andriy Gapon said the following: >> on 24/01/2012 14:32 Gleb Smirnoff said the following: >>> Yes, now: >>> >>> Rebooting... >>> lock order reversal: >>> 1st 0xffffffff80937140 smp rendezvous (smp rendezvous) _at_ >>> /usr/src/head/sys/kern/kern_shutdown.c:542 >>> 2nd 0xfffffe0001f5d838 uart_hwmtx (uart_hwmtx) _at_ >>> /usr/src/head/sys/dev/uart/uart_cpu.h:92 >>> panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx _at_ >>> /usr/src/head/sys/kern/kern_cons.c:500 >> >> OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no >> new >> details... >> >> It's still intriguing to me why the LOR *doesn't* happen [*] with >> stop_scheduler_on_panic=0. But I don't see a productive way to pursue >> this >> investigation further. > > Salve Glebius! > After your recent nudging I took yet another look at this issue and it seems > that > I have some findings. > > For those who might get interested here is a convenience reference to the > whole > thread on gmane: http://thread.gmane.org/gmane.os.freebsd.current/139307 > > A short summary. > Prerequisites: an SMP x86 system, its kernel is configured with WITNESS && > !WITNESS_SKIPSPIN (this is important) and a system uses serial console via > uart. > Then, if stop_scheduler_on_panic is set to zero the system can be rebooted > without > a problem. On the other hand, if stop_scheduler_on_panic is enabled, then > the > system first runs into a LOR when calling printf() in cpu_reset() and then > it runs > into a panic when printf is recursively called from witness(9) to report the > LOR. > The panic happens because of the recursion on cnputs_mtx, which should > ensure > that cnputs() output is not intermingled and which is not flagged to > support > recursion. > > There are two things about this report that greatly confused and puzzled > me: > 1. stop_scheduler_on_panic variable is used _only_ in panic(9). So how > could it > be possible that changing its value affects behavior of the system when > panic(9) > is not called?! > > 2. The LOR in question happens between "smp rendezvous" (smp_ipi_mtx) and > "uart_hwmtx" (sc_hwmtx_s in uart core) spin locks. The order of these locks > is > actually predefined in witness order_lists[] such that uart_hwmtx must come > before > smp_ipi_mtx. But in the reboot path we first take smp_ipi_mtx in > shutdown_reset(), then we call cpu_reset, then it calls printf and from > there we > get to uart_hwmtx for serial console output. So the order between these > spinlocks > is always violated in the x86 SMP reboot path. > How come witness(9) doesn't _always_ detect this LOR? > How come it didn't detect this LOR before any "stop scheduler" commits?! > > [Spoiler alert :-)] > > Turns out that the two puzzles above are closely related. > Let's first list all the things that change when stop_scheduler_on_panic is > enabled and a panic happens: > - other CPUs are stopped (forced to spin) > - interrupts on current CPU are disabled > - by virtue of the above the current thread should be the only thread > running > (unless it executes a voluntary switch) > - all locks are "busted", they are completely ignored / bypassed > - by virtue of the above no lock invariants and witness checks are > performed, so > no lock order checking, no recursion checking, etc > > So, what I observe is this: when stop_scheduler_on_panic is disabled, the > LOR is > actually detected as it should be. witness(9) works properly here. Once > the LOR > is detected witness(9) wants to report it using printf(9). That's where we > run > into the cnputs_mtx recursion panic. It's all exactly as with > stop_scheduler_on_panic enabled. Then panic(9) wants to report the panic > using > printf(9), which goes to cnputs() again, where _mtx_lock_spin_flags() > detects > locks recursion again (this is independent of witness(9)) and calls > panic(9) > again. Then panic(9) wants to report the panic using printf(9)... > I assume that when the stack is exhausted we run into a double fault and > dblfault_handler wants to print something again... Likely we eventually run > into > a triple fault which resets the machine. Although, I recall some old > reports of > machines hanging during reboot in a place suspiciously close to where the > described ordeal happens... > But if the machine doesn't hang we get a full appearance of the reset > successfully > happening (modulo some last messages missing). > > With stop_scheduler_on_panic enabled and all the locks being ignored we, of > course, do not run into any secondary lock recursions and resulting panics. > So > the system is able to at least panic "gracefully" (still we should have > just > reported the LOR gracefully, no panic is required). > > Some obvious conclusions: > - the "smp rendezvous" and "uart_hwmtx" LOR is real and it is the true cause > of > the problem; it should be fixed one way or other - either by correcting > witness > order_lists[] or by avoiding the LOR in shutdown_reset/cpu_reset; > - because witness(9) uses printf(9) to report problems, it is very fragile > to use > witness with any locks that can be acquired under printf(9) > - stop_scheduler_on_panic just uncovers the true bug Thanks for working on this. My thoughts on the matter: - We should not printf() in cpu_reset() which is supposed to be a very low level primitives, usable by any context. Acquiring console spinlocks there is not really a wise choice. - I'd like to see witness being based on an alternative approach, but we may find something that doesn't wrap up (thus a bufring ala KTR is not appropriate) and can work also in SMP context (thus a lockfree algorithm is not appropriate). I don't have any good solution for this on the top of my idea, but of course, relying on a path acquiring locks for a lock protector is not a great idea. The only other thing we may do is marking locks involved in the printf() path with NO_WITNESS, but that of course won't solve the deadlock problem itself. - On a related note, I once had a patch that was having BSP recovering from deadlock in the reset code and panic/print informations, under DIAGNOSTIC. I may find it again and post here. Definitively, I think that we need to make cpu_reset() locks agnostic and if you agree I can make a patch for that. Attilio -- Peace can only be achieved by understanding - A. EinsteinReceived on Thu May 17 2012 - 12:07:40 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:27 UTC