From: Matthew Dillon [mailto:dillon_at_apollo.backplane.com] > Probably not P72.. that would result in weird, inconsistent panics > rather then consistent hangs. To make sure, just cool > your cpu down > a little (open the case and point a big fan at it). If nothing > changes then it isn't P72. Its definitely not hot, plenty of blowers, in an air-conditioned room, has been qualified in environmental chamber. > > The STI; HLT sequence is definitely working properly... operating > systems have depended on that code sequence forever. Going down > that path is a red herring. > > If NMI can't stop the other processors w/ IPI STOP then > the PC for those > cpus that you see in the dump is not necessarily going to be where > they are actually hung. Its not that they're hung, the emulator allows me to see the current PC, registers, etc. They really are sitting with interrupts locked off. In the case that i modified the db to time out on the stop ipi, i can believe that the stacks weren't necessarily consistent, although they seemed to be. In the case I'm using the emulator it seems correct. > > It kinda sounds like ACPI has bokered the other cpus. > I'm not sure > why one would even *want* to use ACPI to idle down Xeon's in an MP > system, actually :-) Its not so much that I want to use ACPI, its that the machine doesn't boot without it, and it can't be disabled later. You do want the HLT on idle, like the sysctl enabled on releng_4, otherwise the performance goes down and the power goes up. I will keep digging, thanks muchly for the input. The other option i will pursue is whether the APIC structure has been altered somehow, something changed in there, etc. --donReceived on Thu Jun 17 2004 - 21:59:11 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:57 UTC