Re: ZFS corrupting data, even just sitting idle

From: Benjamin Close <Benjamin.Close_at_clearchain.com> Date: Wed, 03 Oct 2007 08:42:23 +0930 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:18 UTC

Brooks Talley wrote:
> Hi, everyone.  I'm running 7.0-current amd64, built from CVS on September 12.  I've got a 4.5TB ZFS array across 8 750GB drives in a RAIDZ1 + hotspare configuration.
>
> It's corrupting data even just sitting at idle with no access at all.  I had loaded it up with about 4TB of data several weeks ago, then noticed that a zpool status showed checksum errors about a week ago.  I ran a scrub and it turned 122 errors affecting about 20 files.  The errors were spread across the physical disks pretty evenly, so it didn't seem like one bad drive.
>
> I left for vacation and unplugged the network from the machine to ensure that there would be no access to the disk.  There are no cron jobs or anything else running locally that so much as touch the zpool.
>
> Upon returning, I ran a zpool scrub and it found an additional 116 checksum errors in another 17 files, also evenly spread across the physical drives.
>
> The system is running a Supermicro motherboard, Supermicro AOC-SAT-MV8 SATA card, and WD 750GB drives.  2GB memory, no real apps running, just storage.
>
> Anyone seen anything like this?  It's a bit of a concern.
>   
Just adding a 'me too' to the topic. But in my case I have confirmed 
it's a Sun 3511 raid array corrupting the data. With the cache set to 
write through, everything is perfect. With it set to write back, random 
checksum errors start appearing. The raid is configured as a 4.8TB raid 
5... occasionally 'refreshing' the parity on the array fixes the 
checksum errors.

Before ZFS, there was no way of knowing where the corruption occurred. 
Now we can find out.
Thanks pjd & others.

Cheers,
    Benjamin