Re: panic: APIC: Previous IPI is stuck

From: Brian Fundakowski Feldman <green_at_FreeBSD.org> Date: Sat, 2 Oct 2004 02:02:01 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:15 UTC

On Mon, Sep 27, 2004 at 04:35:44PM -0400, John Baldwin wrote:
> On Friday 24 September 2004 08:24 pm, Andy Farkas wrote:
> > I have been having this problem for a few weeks now. Glad I'm not the only
> > one. My box is a 4xPPro running 5.3-BETA5. It panics with either ULE
> > or 4BSD.
> >
> > My theory is that a physical IPI gets lost somewhere and the kerenl spins
> > waiting for it. But thats just a stab in the dark because nobody cares to
> > explain why IPI's would be stuck.
> 
> The panic has to do with a previous IPI not finished being sent from the same 
> CPU.  I've yet to determine why this happens.  You can try editing 
> sys/i386/i386/local_apic.c and turning on 'DETECT_DEADLOCK' (I think it is 
> just commented out) and seeing if that improves stability.  I also see this 
> on a 4xPIIXeon system I use for testing.
> 
> > -andyf
> >
> > On Fri, 24 Sep 2004, Brian Fundakowski Feldman wrote:
> > > This is on a 2xAthlon with the SCHED_ULE, HZ=1000, SW_WATCHDOG, and
> > > nothing really special in development.
> > >
> > > FreeBSD green.homeunix.org 6.0-CURRENT FreeBSD 6.0-CURRENT #110: Wed Sep
> > > 22 11:28:27 EDT 2004    
> > > root_at_green.homeunix.org:/usr/src/sys/i386/compile/GREEN  i386
> > >
> > > panic: APIC: Previous IPI is stuck
> > > cpuid = 1
> > > KDB: stack backtrace:
> > > kdb_backtrace(c063cae7,1,c063c5e7,d4411b28,c1da2000) at
> > > kdb_backtrace+0x2e panic(c063c5e7,1,f3,1,2) at panic+0x128
> > > lapic_ipi_vectored(f3,1,c1da2494,1,c0675910) at 64) at
> > > sched_add_internal+0x21e kseq_assign(c0675910,1,c0625a07,5e0,c1da1540) at
> > > kseq_assign+0x4a sched_clock(c1da2000,2,c0621165,17e,d4411c54) at
> > > sched_clock+0x74 statclock(d4411c54,c1ecc840,d4411c3c,c05edc8b,d4411c54)
> > > at statclock+0xf8 rtcintr(d4411c54,c0487af4,c06733a0,2,8) at rtcintr+0x4f
> > > intr_execute_handlers(c1dca8f0,d4411c54,d4411cb4,c05ea0e3,38) at
> > > intr_execute_ha ndlers+0xab
> > > lapic_handle_intr(38) at lapic_handle_intr+0x3a
> > > Xapic_isr1() at Xapic_isr1+0x33
> > > --- interrupt, eip = 0xc04a640a, esp = 0xd4411c98, ebp = 0xd4411cb4 ---
> > > _mtx_lock_sleep(c06733e0,c1da2000,0,c06220e8,222) at
> > > _mtx_lock_sleep+0x13a _mtx_lock_flags(c06733e0,0,c06220e8,222,0) at
> > > _mtx_lock_flags+0xc0
> > > ithread_loop(c1da6200,d4411d48,c0621edb,31f,c1da6200) at
> > > ithread_loop+0x15a fork_exit(c0499660,c1da6200,d4411d48) at
> > > fork_exit+0xc6
> > > fork_trampoline() at fork_trampoline+0x8
> > > --- trap 0x1, eip = 0, esp = 0xd4411d7c, ebp = 0 ---
> > > KDB: enter: panic
> > > panic: APIC: Previous IPI is stuck
> > > cpuid = 1
> > > boot() called on cpu#1
> > > Uptime: 2d0h16m55s
> > > ^^ full hang instead of reset

Okay, I just got another one of these, exactly the same as that one but
for the fact that the softclock() interrupt was specifically locking
Giant instead of the interrupt thread loop.  So the other CPU owned
Giant at the time and the scheduling CPU is trying to acquire it and
interrupted by needing to run the statclock().

This is way too coincidental to ignore.

SCHED_ULE is far too complex for me to understand much of right now;
what prevents sched_clock() from calling kseq_assign() multiple times
per CPU?  Are we _absolutely_100%_certain_ that functionality works
correctly?

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green_at_FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\