Re: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAM status: ATA Status Error

From: Daniel Kalchev <daniel_at_digsys.bg> Date: Wed, 13 Dec 2017 13:25:20 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:14 UTC

> On 13 Dec 2017, at 1:26, Freddie Cash <fjwcash_at_gmail.com> wrote:
> 
> On Tue, Dec 12, 2017 at 2:55 PM, Rodney W. Grimes <
> freebsd-rwg_at_pdx.rh.cn85.dnsmgr.net> wrote:
> 
>> Hum, just noticed this.  25k hours power on, 2M load cycles, this is
>> very hard on a hard drive.  Your drive is going into power save mode
>> and unloading the heads.  Infact at a rate of 81 times per hour?
>> Oh, I can not believe that.  Either way we need to get this stopped,
>> it shall wear your drives out.
>> 
> 
> Believe it.  :)  The WD Green drives have a head parking timeout of 15
> seconds, and no way to disable that anymore.  You used to be able to boot
> into DOS and run the tler.exe program from WD to disable the auto-parking
> feature, but they removed that ability fairly quickly.
> 
> The Green drives are meant to be used in systems that spend most of their
> time idle.  Trying to use them in an always-on RAID array is just asking
> for trouble.  They are only warrantied for a couple hundred thousand head
> parkings or something ridiculous like that.  2 million puts it way out of
> the warranty coverage.  :(
> 
> We had 24 of them in a ZFS pool back when they were first released as they
> were very inexpensive.  They lead to more downtime and replacement costs
> than any other drive we've used since (or even before).  Just don't use
> them in any kind of RAID array or always-on system.
> 

In order to handle drives like this and in general to get rid of load cycles, I use smartd on  all my ZFS pools with this piece of config:

DEVICESCAN -a -o off -e apm,off 

Might not be the best solution, but as it is activated during boot, S.M.A.R.T. attribute 193 Load_Cycle_Count does not increase anymore. Not fan of WD drives, but have few tens of them… all of them “behave” in some way or another.

For the original question, if I do not have spare disk to replace, on a raidz1/raidz2 pool I would typically do:

zpool offline poolname disk
dd if=/dev/zero of=/dev/disk bs=1m
zpool replace poolname disk

This effectively fills the disk with zeros, forcing any suspected unreadable blocks to be replaced. After this operation, no more pending blocks etc. But, on large drives/pools requires few days to complete (the last part). Over the years, I have used this procedure on many drives, sometimes more than once on the same drive and that posponed having to replace the drive and the annoying S.M.A.R.T. message: which by itself might not be major problem, but better not have the logs filled with warnings all the time.

I feel more confident doing this on raidz2 vdevs anyway..

If I had spare disk and spare port, just

zpool replace poolname disk

Daniel