Re: MCA messages in /var/log/message?

From: John Baldwin <jhb_at_freebsd.org>
Date: Fri, 23 Apr 2010 09:48:28 -0400
On Thursday 22 April 2010 6:28:34 pm Steve Kargl wrote:
> How does one interpret the following MCA message?
> 
> MCA: Bank 4, Status 0x945a4000d6080a13
> MCA: Global Cap 0x0000000000000105, Status 0x0000000000000000
> MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0
> MCA: CPU 0 COR BUSLG Responder RD Memory
> MCA: Address 0x70c42280
> MCA: Bank 4, Status 0x942140012a080813
> MCA: Global Cap 0x0000000000000105, Status 0x0000000000000000
> MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 1
> MCA: CPU 1 COR BUSLG Source RD Memory
> MCA: Address 0x1b97ca578
> 
> It appears that these messages coincide with a 15 to 30
> second period where my USB mouse inexplicably loses a
> large number of button clicks, (which is quite noticable
> with firefox3).

If you have access to p4, you can download a patched version of mcelog from 
//depot/projects/mcelog/... (have to use 'make FREEBSD=yes') which will parse 
these for you.

Hmm, I ran it and here is what it said:

HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge
ADDR 70c42280
  Northbridge RAM Chipkill ECC error
  Chipkill ECC syndrome = d6b4
       bit46 = corrected ecc error
  bus error 'local node response, request didn't time out
             generic read mem transaction
             memory access, level generic'
STATUS 945a4000d6080a13 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0
CPUID Vendor AMD Family 15 Model 5
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge
ADDR 1b97ca578
  Northbridge RAM Chipkill ECC error
  Chipkill ECC syndrome = 2a42
       bit32 = err cpu0
       bit46 = corrected ecc error
  bus error 'local node origin, request didn't time out
             generic read mem transaction
             memory access, level generic'
STATUS 942140012a080813 MCGSTATUS 0
MCGCAP 105 APICID 1 SOCKETID 0
CPUID Vendor AMD Family 15 Model 5

Note that they are corrected errors, so the RAM may not actually be bad, it 
just may be transient failures.

-- 
John Baldwin
Received on Fri Apr 23 2010 - 11:50:52 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:03 UTC