Re: Strange ZFS filesystem corruption

From: Paul Mather <paul_at_gromit.dlib.vt.edu>
Date: Tue, 4 Oct 2011 10:31:22 -0400
On Oct 3, 2011, at 6:19 PM, Artem Belevich wrote:

> On Mon, Oct 3, 2011 at 11:21 AM, Paul Mather <paul_at_gromit.dlib.vt.edu> wrote:
>> =====
>> 
>> The pool itself reports no errors.  I performed a scrub on the pool yet this bizarre filesystem corruption persists:
>> 
>> =====
>> tape# zpool status backups
>>  pool: backups
>>  state: ONLINE
>>  scan: scrub repaired 15K in 7h33m with 0 errors on Sat Oct  1 19:22:35 2011
> 
> The pool *did* report 15K errors that it was able to repair.
> 
> I'd start with testing your RAM with memtest86 or memtest86+. ZFS
> errors without reported checksum errors may be the sign of bad memory.
> I.e. data gets corrupted before ZFS gets to calculate checksum and
> later invalid data with valid checksum gets written to disk.


Because this machine has ECC RAM, I checked the BIOS logs for ECC errors (the BIOS is set to log them) and there are no ECC errors logged.  If the RAM were going bad, I would expect it to leave some kind of trace in the BIOS log.

Do uncorrectable ECC errors get logged as MCEs under FreeBSD 9?

I've never noticed any problems when doing a "make -j8 buildworld" on this machine, either.

Cheers,

Paul.
Received on Tue Oct 04 2011 - 12:31:53 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:18 UTC