Re: FreeBsd MCA Panic Crash !!

From: John Baldwin <jhb_at_freebsd.org>
Date: Mon, 04 Jan 2016 07:10:18 -0800
On Monday, January 04, 2016 02:17:51 PM Steven Hartland wrote:
> Bank 5 seems to be common to all the crashes, which may suggest you have 
> some dodgy ram or possibly the driving CPU's memory controller.

No, this has nothing to do with that.  Bank 5 means that it is bank 5 of the
Machine check registers in the processor that are triggering the errors
(MC5_*).  Different "banks" of the MC registers handle errors for different
parts of the hardware (and this varies by CPU).  For example, on Nehalem
CPUs, the memory controller logs errors (e.g. ECC errors) in bank 8, but
that has no correlation to the "bank" of DIMMs that the error occurred in.
Later Intel CPUs can log the same errors in register banks 8 through 12
(IIRC).  Depending on the CPU model, you can determine more info about the
error using the CPU manuals (for Intel the SDM).

> As the error says this is a Hardware issue.

Well, mcelog has this hardcoded and prints this for every MCA just as a
matter of course.  It isn't selective but assumes every machine check is
a hardware error (which they are, though some are warnings for corrected
events that you can ignore as the hardware hasn't degraded enough to
warrant replacement.  However, corrected events don't generate panics,
just messages in the logs, and only a subset of corrected events include
the "yellow / green" indicators for which you can ignore "green" events.
Even corrected ECC errors I would ignore if you get a few events with
a count of 1 that don't recur).

-- 
John Baldwin
Received on Mon Jan 04 2016 - 14:10:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:02 UTC