* >> Are those the only MCA errors you're seeing? The reason I ask is that there's an errata in the X5600 series which can cause an "internal timer error" MCA to be logged after another uncorrectable MCA occurs.* 90% are these MCA errors regarding rest of the 10% there is no log for it such as one of the supermicro was rebooted two days ago but it was unable to generate crashdump under /var/crash directory though dump is enabled in rc.conf : dumpdev="AUTO" dumpdir="/var/crash" *>>This seems to me like it would be a CPU failure. Can you try replacing the CPU itself? I've seen this exact message on a different board, and the cause was a failing CPU. * We're thinking to replace x5690 with x5675 CPUs. *>>Well, mcelog has this hardcoded and prints this for every MCA just as a matter of course. It isn't selective but assumes every machine check is a hardware error (which they are, though some are warnings for corrected events that you can ignore as the hardware hasn't degraded enough to warrant replacement. However, corrected events don't generate panics, just messages in the logs, and only a subset of corrected events include the "yellow / green" indicators for which you can ignore "green" events. Even corrected ECC errors I would ignore if you get a few events with a count of 1 that don't recur). * Each time the MCA error occurs, server went down. So please guide how do we suppose to tackle this issue ? * >> Depending on the CPU model, you can determine more info about the error using the CPU manuals (for Intel the SDM). * CPU is x5690, is there a link we can get manual for supermicro x5690 cpu ? -- View this message in context: http://freebsd.1045724.n5.nabble.com/FreeBsd-MCA-Panic-Crash-tp6064691p6065043.html Sent from the freebsd-current mailing list archive at Nabble.com.Received on Tue Jan 05 2016 - 09:25:09 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:02 UTC