Re: STI, HLT in acpi_cpu_idle_c1

From: John Baldwin <jhb_at_FreeBSD.org> Date: Thu, 24 Jun 2004 14:38:29 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:58 UTC

On Thursday 24 June 2004 10:36 am, Gerrit Nagelhout wrote:
> Here's some information about another slightly different
> lockup.  CPU0 is blocked in smp_targeted_tlb_shootdown (vector 0xf5).
> CPU2 & 3 are in acpi_cpu_c1.  CPU1 (again) is in acpi_cpu_c1,
> but it has an interrupt pending.  In this case, the pending
> interrupt is bit 27.  224 + 27 = 251 = IPI_HARDCLOCK.
> How can I figure out how CPU1 got stuck in this state?  As
> far as I can tell, there is either a h/w problem, or CPU1
> has gone to sleep after starting to handle an interrupt.
> Thanks,

Does all of the deadlocks stop if you turn off halting when idle by doing 
'sysctl machdep.cpu_idle_hlt=0'?

> Gerrit
>
> P0>dumpAllLocalApic
> CPU 0
> ID:    0x6000000
> TPR:   0x0
> PPR:   0x0
> icr_lo:0xf5

last sent INVLPG

> APR:   0x0
> ISR0:  0x0
> ISR1:  0x0
> ISR2:  0x0
> ISR3:  0x0
> ISR4:  0x0
> ISR5:  0x0
> ISR6:  0x0
> ISR7:  0x0
> IRR0:  0x0
> IRR1:  0x0
> IRR2:  0x0
> IRR3:  0x0
> IRR4:  0x0
> IRR5:  0x0
> IRR6:  0x0
> IRR7:  0x18000000

This actually has 2 pending interrupts that it needs to service, both 252 
(statclock) and 251 (hardclock).

> TMR0:  0x0
> TMR1:  0x0
> TMR2:  0x0
> TMR3:  0x0
> TMR4:  0x0
> TMR5:  0x0
> TMR6:  0x0
> TMR7:  0x0
> CPU 1
> ID:    0x7000000
> TPR:   0x0
> PPR:   0xf0
> icr_lo:0xf3

last sent AST

> APR:   0x0
> ISR0:  0x0
> ISR1:  0x0
> ISR2:  0x0
> ISR3:  0x0
> ISR4:  0x0
> ISR5:  0x0
> ISR6:  0x0
> ISR7:  0x8000000

Currently handling hardclock

> IRR0:  0x0
> IRR1:  0x0
> IRR2:  0x0
> IRR3:  0x0
> IRR4:  0x0
> IRR5:  0x0
> IRR6:  0x0
> IRR7:  0x18200000

This has 3 pending (INVLPG, hardclock, statclock) and is currently servicing 
statclock.  This means some CPU has sent INVLPG (f5) and is spinning with 
interrupts disabled waiting for CPU 1 to ack.  This could be CPU 0.

> TMR0:  0x0
> TMR1:  0x0
> TMR2:  0x0
> TMR3:  0x0
> TMR4:  0x0
> TMR5:  0x0
> TMR6:  0x0
> TMR7:  0x0
> CPU 2
> ID:    0x0
> TPR:   0x0
> PPR:   0x0
> icr_lo:0xfb

last sent hardclock

> APR:   0x0
> ISR0:  0x0
> ISR1:  0x0
> ISR2:  0x0
> ISR3:  0x0
> ISR4:  0x0
> ISR5:  0x0
> ISR6:  0x0
> ISR7:  0x0
> IRR0:  0x0
> IRR1:  0x1000000
> IRR2:  0x0
> IRR3:  0x0
> IRR4:  0x20000
> IRR5:  0x0
> IRR6:  0x0
> IRR7:  0x0
> TMR0:  0x0
> TMR1:  0x0
> TMR2:  0x1000
> TMR3:  0x0
> TMR4:  0x20000
> TMR5:  0x0
> TMR6:  0x0
> TMR7:  0x0

CPU 2 must have interrupts disabled as it has 2 PCI interrupts (IRQs 56 and 
145, must have a lot of I/O APICs in this box!) both which are level 
triggered (hence bits set in TMR).

> CPU 3
> ID:    0x1000000
> TPR:   0x0
> PPR:   0x0
> icr_lo:0xf3

last sent an AST

> APR:   0x0
> ISR0:  0x0
> ISR1:  0x0
> ISR2:  0x0
> ISR3:  0x0
> ISR4:  0x0
> ISR5:  0x0
> ISR6:  0x0
> ISR7:  0x0
> IRR0:  0x0
> IRR1:  0x0
> IRR2:  0x0
> IRR3:  0x0
> IRR4:  0x0
> IRR5:  0x0
> IRR6:  0x0
> IRR7:  0x0
> TMR0:  0x0
> TMR1:  0x0
> TMR2:  0x0
> TMR3:  0x0
> TMR4:  0x0
> TMR5:  0x0
> TMR6:  0x0
> TMR7:  0x0

Nothing pending or currently executing.  Its ok for this one to be halted 
(CPU3), but neither CPU2 nor CPU1 should be halted.  CPU2 claims to be 
executing Xhardclock which does an EOI in < 20 instructions after it starts.  
Does the ISR for CPU 2 clear if you let it continue for a bit?

-- 
John Baldwin <jhb_at_FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org