On Thursday 24 June 2004 10:36 am, Gerrit Nagelhout wrote: > Here's some information about another slightly different > lockup. CPU0 is blocked in smp_targeted_tlb_shootdown (vector 0xf5). > CPU2 & 3 are in acpi_cpu_c1. CPU1 (again) is in acpi_cpu_c1, > but it has an interrupt pending. In this case, the pending > interrupt is bit 27. 224 + 27 = 251 = IPI_HARDCLOCK. > How can I figure out how CPU1 got stuck in this state? As > far as I can tell, there is either a h/w problem, or CPU1 > has gone to sleep after starting to handle an interrupt. > Thanks, Does all of the deadlocks stop if you turn off halting when idle by doing 'sysctl machdep.cpu_idle_hlt=0'? > Gerrit > > P0>dumpAllLocalApic > CPU 0 > ID: 0x6000000 > TPR: 0x0 > PPR: 0x0 > icr_lo:0xf5 last sent INVLPG > APR: 0x0 > ISR0: 0x0 > ISR1: 0x0 > ISR2: 0x0 > ISR3: 0x0 > ISR4: 0x0 > ISR5: 0x0 > ISR6: 0x0 > ISR7: 0x0 > IRR0: 0x0 > IRR1: 0x0 > IRR2: 0x0 > IRR3: 0x0 > IRR4: 0x0 > IRR5: 0x0 > IRR6: 0x0 > IRR7: 0x18000000 This actually has 2 pending interrupts that it needs to service, both 252 (statclock) and 251 (hardclock). > TMR0: 0x0 > TMR1: 0x0 > TMR2: 0x0 > TMR3: 0x0 > TMR4: 0x0 > TMR5: 0x0 > TMR6: 0x0 > TMR7: 0x0 > CPU 1 > ID: 0x7000000 > TPR: 0x0 > PPR: 0xf0 > icr_lo:0xf3 last sent AST > APR: 0x0 > ISR0: 0x0 > ISR1: 0x0 > ISR2: 0x0 > ISR3: 0x0 > ISR4: 0x0 > ISR5: 0x0 > ISR6: 0x0 > ISR7: 0x8000000 Currently handling hardclock > IRR0: 0x0 > IRR1: 0x0 > IRR2: 0x0 > IRR3: 0x0 > IRR4: 0x0 > IRR5: 0x0 > IRR6: 0x0 > IRR7: 0x18200000 This has 3 pending (INVLPG, hardclock, statclock) and is currently servicing statclock. This means some CPU has sent INVLPG (f5) and is spinning with interrupts disabled waiting for CPU 1 to ack. This could be CPU 0. > TMR0: 0x0 > TMR1: 0x0 > TMR2: 0x0 > TMR3: 0x0 > TMR4: 0x0 > TMR5: 0x0 > TMR6: 0x0 > TMR7: 0x0 > CPU 2 > ID: 0x0 > TPR: 0x0 > PPR: 0x0 > icr_lo:0xfb last sent hardclock > APR: 0x0 > ISR0: 0x0 > ISR1: 0x0 > ISR2: 0x0 > ISR3: 0x0 > ISR4: 0x0 > ISR5: 0x0 > ISR6: 0x0 > ISR7: 0x0 > IRR0: 0x0 > IRR1: 0x1000000 > IRR2: 0x0 > IRR3: 0x0 > IRR4: 0x20000 > IRR5: 0x0 > IRR6: 0x0 > IRR7: 0x0 > TMR0: 0x0 > TMR1: 0x0 > TMR2: 0x1000 > TMR3: 0x0 > TMR4: 0x20000 > TMR5: 0x0 > TMR6: 0x0 > TMR7: 0x0 CPU 2 must have interrupts disabled as it has 2 PCI interrupts (IRQs 56 and 145, must have a lot of I/O APICs in this box!) both which are level triggered (hence bits set in TMR). > CPU 3 > ID: 0x1000000 > TPR: 0x0 > PPR: 0x0 > icr_lo:0xf3 last sent an AST > APR: 0x0 > ISR0: 0x0 > ISR1: 0x0 > ISR2: 0x0 > ISR3: 0x0 > ISR4: 0x0 > ISR5: 0x0 > ISR6: 0x0 > ISR7: 0x0 > IRR0: 0x0 > IRR1: 0x0 > IRR2: 0x0 > IRR3: 0x0 > IRR4: 0x0 > IRR5: 0x0 > IRR6: 0x0 > IRR7: 0x0 > TMR0: 0x0 > TMR1: 0x0 > TMR2: 0x0 > TMR3: 0x0 > TMR4: 0x0 > TMR5: 0x0 > TMR6: 0x0 > TMR7: 0x0 Nothing pending or currently executing. Its ok for this one to be halted (CPU3), but neither CPU2 nor CPU1 should be halted. CPU2 claims to be executing Xhardclock which does an EOI in < 20 instructions after it starts. Does the ISR for CPU 2 clear if you let it continue for a bit? -- John Baldwin <jhb_at_FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.orgReceived on Thu Jun 24 2004 - 16:37:51 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:58 UTC