:Thanks for the detailed info on this. It looks like CPU1 is trying :to service the interrupt because PPR = 0xf0, and TPR = 0x00. It is Yup. :also the only CPU that has a bit set in ISR. In this case, CPU 3 :was initiating the IPI (although I don't know why its icr_lo is :0xc00f6 because I was expecting it to be 0xc00f3 (and it was in :previous lockups). I still have no idea why CPU1 is not handling :this interrupt though. I am still getting used to this emulator, but :I think the values I am reading are believable: Yup. It would be nice to see the contents of the TMR and IRR as well, but the ISR is very important. If the ISR bit is set it means that the APIC delivered the interrupt to the cpu but the cpu has not yet EOI'd it. This would account for the contents of the PPR (it is based on the highest pending ISR bit. When you EOI the APIC it knows what to EOI based on the PPR). It is possible that the cpu got the interrupt and started processing it, then deadlocked in a mutex and went to idle before it had a chance to EOI it. I haven't looked at the IPI handling path in 5.x recently so this might not be a possible case, but so far it's the only thing that fits the bill. Perhaps it tried to use a sleep mutex when it really should have been using a spin mutex. Once the APIC has delivered the interrupt to the cpu nothing more happens to that interrupt until it is EOId. If the cpu has not EOI'd that interrupt and HLT's, it will *NOT* be reinterrupted by that particular interrupt. What I would do now is (attempt to) get some sort of core that you can gdb on or attach gdb too. I'm afraid I can't help much there. But what you want to do is get a stack backtrace for every single thread in the system, looking for a stuck mutex in an IPI path interrupting the normal thread's operation. Again, I don't know if this scenario is possible, but it's the only thing I can think of. Perhaps John can narrow down the possibilities some more. On the ICR_LO values: I don't know what one should expect for the vector portion. The 'c' in the c00f6 is definitely correct (the last command was sent to all excluding self). The status fields are 0 (Edge trigger, APIC has accepted the command, physical mode, fixed command). Since all APICs have S=0 we know that the problem is NOT an APIC-APIC deadlock. The APICs look to be in good shape. :CPU 0 :TPR: 0x0 << not priority masked :PPR: 0x0 << nothing pending :icr_lo:0xf3 << ready to accept command :CPU 1 :ID: 0x7000000 :TPR: 0x0 << not priority masked :PPR: 0xf0 << priority of delivered but not-yet EOI'd interrupt :icr_lo:0xf3 << ready to accept command :ISR7: 0x80000 << interrupt delivered and is in-service, not yet EOI'd :CPU 2 :TPR: 0x0 << not priority masked :PPR: 0x0 << nothing pending :icr_lo:0xfb << ready to accept command :CPU 3 :ID: 0x1000000 :TPR: 0x0 << not priority masked :PPR: 0x0 << nothing pending :icr_lo:0xc00f6 << ready to accept new command (previous command was accepted), last sent command was IPI to all-but-self. : :Gerrit -Matt Matthew Dillon <dillon_at_backplane.com>Received on Wed Jun 23 2004 - 03:40:22 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:58 UTC