On Tuesday 22 June 2004 09:01 pm, Gerrit Nagelhout wrote: > Thanks for the detailed info on this. It looks like CPU1 is trying > to service the interrupt because PPR = 0xf0, and TPR = 0x00. It is > also the only CPU that has a bit set in ISR. In this case, CPU 3 > was initiating the IPI (although I don't know why its icr_lo is > 0xc00f6 because I was expecting it to be 0xc00f3 (and it was in > previous lockups). I still have no idea why CPU1 is not handling > this interrupt though. I am still getting used to this emulator, but > I think the values I am reading are believable: > > P3>dumpAllLocalApic > CPU 0 > ID: 0x6000000 > TPR: 0x0 > PPR: 0x0 > icr_lo:0xf3 > ISR0: 0x0 > ISR1: 0x0 > ISR2: 0x0 > ISR3: 0x0 > ISR4: 0x0 > ISR5: 0x0 > ISR6: 0x0 > ISR7: 0x0 > CPU 1 > ID: 0x7000000 > TPR: 0x0 > PPR: 0xf0 > icr_lo:0xf3 > ISR0: 0x0 > ISR1: 0x0 > ISR2: 0x0 > ISR3: 0x0 > ISR4: 0x0 > ISR5: 0x0 > ISR6: 0x0 > ISR7: 0x80000 bit 19 is set, so vector of 224 + 19 = 243. #define APIC_LOCAL_INTS 240 #define APIC_IPI_INTS (APIC_LOCAL_INTS + 3) #define IPI_AST APIC_IPI_INTS /* Generate software trap. */ So it's an IPI_AST which is EOI'd before we do anything: IDTVEC(cpuast) PUSH_FRAME movl $KDSEL, %eax movl %eax, %ds /* use KERNEL data segment */ movl %eax, %es movl $KPSEL, %eax movl %eax, %fs movl lapic, %edx movl $0, LA_EOI(%edx) /* End Of Interrupt to APIC */ FAKE_MCOUNT(TF_EIP(%esp)) MEXITCOUNT jmp doreti Hmm nothing in the kernel does an IPI to all but self with IPI_AST. Only with IPI_RENDEZVOUS in MI code. > CPU 2 > ID: 0x0 > TPR: 0x0 > PPR: 0x0 > icr_lo:0xfb > ISR0: 0x0 > ISR1: 0x0 > ISR2: 0x0 > ISR3: 0x0 > ISR4: 0x0 > ISR5: 0x0 > ISR6: 0x0 > ISR7: 0x0 > CPU 3 > ID: 0x1000000 > TPR: 0x0 > PPR: 0x0 > icr_lo:0xc00f6 0xf6 is the vector 246 #define IPI_INVLRNG (APIC_IPI_INTS + 3) That is an IPI that is sent via all_but_self. *sigh* And the TLB shootdown code does sit and spin in a loop with interrupts disabled after sending the IPI. Hmm, I do see one possible bug. It's only safe to spin like that if the same lock protects all such spin cases. For the lazypmap stuff a different lock is used. You can try this patch to see if it helps any. Kris Kenneway, you might want to try this, too on the box with the lazyfix timeouts. Index: pmap.c =================================================================== RCS file: /usr/cvs/src/sys/i386/i386/pmap.c,v retrieving revision 1.473 diff -u -r1.473 pmap.c --- pmap.c 17 Jun 2004 06:16:57 -0000 1.473 +++ pmap.c 23 Jun 2004 14:39:32 -0000 _at__at_ -1292,7 +1296,7 _at__at_ while ((mask = pmap->pm_active) != 0) { spins = 50000000; mask = mask & -mask; /* Find least significant set bit */ - mtx_lock_spin(&lazypmap_lock); + mtx_lock_spin(&smp_tlb_mtx); #ifdef PAE lazyptd = vtophys(pmap->pm_pdpt); #else _at__at_ -1312,7 +1316,7 _at__at_ break; } } - mtx_unlock_spin(&lazypmap_lock); + mtx_unlock_spin(&smp_tlb_mtx); if (spins == 0) printf("pmap_lazyfix: spun for 50000000\n"); } -- John Baldwin <jhb_at_FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.orgReceived on Wed Jun 23 2004 - 12:39:22 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:58 UTC