RE: 11-CURRENT r275641 panic: Unrecoverable machine check exception

From: Rang, Anton <anton.rang_at_isilon.com>
Date: Mon, 15 Dec 2014 17:49:54 +0000
> I certainly could be wrong - but how to know for sure the cause of the panic?

> MCA: CPU 0 UNCOR PCC OVER DCACHE L2 DRD error
> MCA: Address 0xbd8d4cc0
> MCA: Misc 0x30e3000086

The "root cause" may be hard to determine, but the immediate cause was helpfully decoded by the kernel. (Though I don't know whether all of the model-specific fields were decoded.)

UNCOR = uncorrected error
PCC = processor context corrupted (can't safely continue to execute, thus the panic)
OVER = error overflow (hmmm, multiple errors occurred)
DCACHE L2 DRD = data being read from L2 data cache

The miscellaneous register indicates that 0xbd8d4cc0 is a physical address.

So this looks like a processor failure. If it is repeatable, though, it may indicate either failed hardware or some problem in configuring the processor (though I'm not sure how that could lead to a cache error).

Anton
Received on Mon Dec 15 2014 - 17:07:03 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:54 UTC