On Monday, January 04, 2016 02:17:51 PM Steven Hartland wrote: > Bank 5 seems to be common to all the crashes, which may suggest you have > some dodgy ram or possibly the driving CPU's memory controller. No, this has nothing to do with that. Bank 5 means that it is bank 5 of the Machine check registers in the processor that are triggering the errors (MC5_*). Different "banks" of the MC registers handle errors for different parts of the hardware (and this varies by CPU). For example, on Nehalem CPUs, the memory controller logs errors (e.g. ECC errors) in bank 8, but that has no correlation to the "bank" of DIMMs that the error occurred in. Later Intel CPUs can log the same errors in register banks 8 through 12 (IIRC). Depending on the CPU model, you can determine more info about the error using the CPU manuals (for Intel the SDM). > As the error says this is a Hardware issue. Well, mcelog has this hardcoded and prints this for every MCA just as a matter of course. It isn't selective but assumes every machine check is a hardware error (which they are, though some are warnings for corrected events that you can ignore as the hardware hasn't degraded enough to warrant replacement. However, corrected events don't generate panics, just messages in the logs, and only a subset of corrected events include the "yellow / green" indicators for which you can ignore "green" events. Even corrected ECC errors I would ignore if you get a few events with a count of 1 that don't recur). -- John BaldwinReceived on Mon Jan 04 2016 - 14:10:45 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:02 UTC