On May 25, 2009, at 05:39 PM, Freddie Cash wrote: > On Mon, May 25, 2009 at 2:13 AM, Thomas Backman > <serenity_at_exscape.org> wrote: >> On May 24, 2009, at 09:02 PM, Thomas Backman wrote: >> >>> So, I was playing around with RAID-Z and self-healing... >> >> Yet another follow-up to this. >> It appears that all traces of errors vanish after a reboot. So, say >> you have >> a dying disk; ZFS repairs the data for you, and you don't notice >> (unless you >> check zpool status). Then you reboot, and there's NO (easy?) way >> that I can >> tell to find out that something is wrong with your hardware! > > On our storage server that was initially configured using 1 large > 24-drive raidz2 vdev (don't do that, by the way), we had 1 drive go > south. "zpool status" was full of errors. And the error counts > survived reboots. Either that, or the drive was so bad that the error > counts started increasing right away after a boot. After a week of > fighting with it to get the new drive to resilver and get added to the > vdev, we nuked it and re-created it using 3 raidz2 vdevs each > comprised of 8 drives. > > (Un)fortunately, that was the only failure we've had so far, so can't > really confirm/deny the "error counts reset after reboot". Was this on FreeBSD? I have another unfortunate thing to note regarding this: after a reboot, it's even impossible to tell *which disk* has gone bad, even if the pool is "uncleared" but otherwise "healed". It simply says that a device has failed, with no clue as to which one, since they're all "ONLINE"! Regards, ThomasReceived on Mon May 25 2009 - 14:12:58 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:48 UTC