Re: ZFS corrupting data, even just sitting idle

From: Brooks Talley <brooks_at_illuminati.org>
Date: Tue, 2 Oct 2007 12:07:40 -0700 (PDT)
I do apologize for the subject-verb construction that implied that ZFS itself, or the ZFS code, or anyone responsible for ZFS, or the letter "Z", was corrupting the data rather than merely being subject to the corruption, or at most a potential suspect.  I should have said "A storage system comprised of ZFS filesystem, the underlying geom system, the kernel, the ATA driver, the firmware and hardware on the SATA card, the PCI bridge, the SATA cables, the drives themselves, the power supply, system case, and surrounding environment including temperature, humidity, and RF fields, is corrupting its data".  I just figured that was implicit and that we were all results-oriented rather than blame-oriented.  Sorry!

I will look into the Sorens ATA driver and see what I can dig up.

Thanks!
-b

----- Original Message -----
From: "Sverre Svenningsen" <ss.alert_at_online.no>
To: "Pawel Jakub Dawidek" <pjd_at_freebsd.org>
Cc: "Brooks Talley" <brooks_at_illuminati.org>, "freebsd-current" <freebsd-current_at_freebsd.org>
Sent: Tuesday, October 2, 2007 11:27:55 AM (GMT-0800) America/Los_Angeles
Subject: Re: ZFS corrupting data, even just sitting idle





On Oct 2, 2007 , at 20:14 , Pawel Jakub Dawidek wrote: 



On Tue, Oct 02, 2007 at 10:04:12AM -0700, Brooks Talley wrote: 


Hi, everyone. I'm running 7.0-current amd64, built from CVS on September 12 . I've got a 4.5TB ZFS array across 8 750GB drives in a RAIDZ1 + hotspare configuration. 


It's corrupting data even just sitting at idle with no access at all. I had loaded it up with about 4TB of data several weeks ago, then noticed that a zpool status showed checksum errors about a week ago. I ran a scrub and it turned 122 errors affecting about 20 files. The errors were spread across the physical disks pretty evenly, so it didn't seem like one bad drive. 


I left for vacation and unplugged the network from the machine to ensure that there would be no access to the disk. There are no cron jobs or anything else running locally that so much as touch the zpool. 


Upon returning, I ran a zpool scrub and it found an additional 116 checksum errors in another 17 files, also evenly spread across the physical drives. 


The system is running a Supermicro motherboard, Supermicro AOC-SAT-MV8 SATA card, and WD 750GB drives. 2GB memory, no real apps running, just storage. 


Anyone seen anything like this? It's a bit of a concern. 


Ok, and why do you blame ZFS for corrupting for data instead of be 
thankful for detecting corruptions? I'm quite sure it's not ZFS what is 
corrupting your data. 


-- 
Pawel Jakub Dawidek http://www.wheel.pl 
pjd_at_FreeBSD.org http://www.FreeBSD.org 
FreeBSD committer Am I Evil? Yes, I Am! 

Supposedly this card uses a Marvell 88SX6081 chipset, which as far as i could tell is handled by Sorens ATA driver. Looks like work done elsewhere in the kernel is making that driver misbehave in all sorts of weird ways now. 
It's nice that ZFS makes it easy to discover, at least :) 


-Sverre 
Received on Tue Oct 02 2007 - 17:07:40 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:18 UTC