Re: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAMstatus: ATA Status Error

From: Rodney W. Grimes <freebsd-rwg_at_pdx.rh.CN85.dnsmgr.net> Date: Wed, 13 Dec 2017 08:47:53 -0800 (PST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:14 UTC

> On Tue, 12 Dec 2017 14:58:28 -0800
> Cy Schubert <Cy.Schubert_at_komquats.com> wrote:
> 
> > There are a couple of ways you can address this. You'll need to
> > offline the vdev first. If you've done a smartcrl -t long and if the
> > test failed, smartcrl -a will tell you which block it had an issue
> > with. You can use dd, ddrescue or dd_rescue to dd the block over
> > itself. The drive may rewrite the (weak) block or if it fails to it
> > will remap it (subsequently showing as reallocated).
> > 
> > Of course there is a risk. If the sector is any of the boot blocks
> > there is a good chance the server will hang.
> 
> The drive is part of a dedicated storage-only pool. The boot drive is a
> fast SSD. So I do not care about this - well, to say it more politely:
> I do not have to take care of that aspect.
> 
> > 
> > You have to be *absolutely* sure which the bad sector is. And, there
> > may be more. There is a risk of data loss.
> > 
> > I've used this technique many times. Most times it works perfectly.
> > Other times the affected file is lost but the rest of the file system
> > is recovered. And again there is always the risk.
> > 
> > Replace the disk immediately if you experience a growing succession
> > of pending sectors. Otherwise replace the disk at your earliest
> > convenience.
> 
> The ZFS scrubbing of the volume ended this morning, leaving the pool in
> a healthy state. After reboot, there was no sign of CAM errors again.
> 
> But there is something else I'm worried about. The mainboard I use is a 
> 
> ASRock Z77 Pro4-M.
> The board has a cripple Intel MCP with 6 SATA ports from the chipset,
> two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA
> 6GB ports:
> 
> [...]
> ahci0_at_pci0:2:0:0:       class=0x010601 card=0x06121849 chip=0x06121b21
> rev=0x01 hdr=0x00 vendor     = 'ASMedia Technology Inc.'
>     device     = 'ASM1062 Serial ATA Controller'
>     class      = mass storage
>     subclass   = SATA
>     bar   [10] = type I/O Port, range 32, base 0xe050, size 8, enabled
>     bar   [14] = type I/O Port, range 32, base 0xe040, size 4, enabled
>     bar   [18] = type I/O Port, range 32, base 0xe030, size 8, enabled
>     bar   [1c] = type I/O Port, range 32, base 0xe020, size 4, enabled
>     bar   [20] = type I/O Port, range 32, base 0xe000, size 32, enabled
>     bar   [24] = type Memory, range 32, base 0xf7b00000, size 512,
>     enabled
> [...]
> 
> Attached to that ASM1062 SATA chip, is a backup drive via eSATA
> connector, a WD 4 TB RED drive. It seems, whenever I attach this drive
> and it is online, I experience problems on the ZFS pool, which is
> attached to the MCP SATA ports.

How does this external drive get its power?  Are the earth grounds of
both the system and the external drive power supply closely tied
togeather?  A plug/unplug event with a slight ground creep can
wreck havioc with device operation.

> Is this possible? I mean, as I asked before, a weird/defect cabling
> would trigger different error schemes (CRC errors). Due to the fact
> that the external drive is physically decoupled and is not capable of
> coupling in vibrations, bad sector errors seem to me unlikely. But this
> is simply a though of someone without special knowledge about physics
> of HDDs.

Even if left cabled, does this drive get powered up/down?  

> I think people responding to my thread made it clear that the WD Green
> isn't the first-choice-solution for a 20/6 (not 24/7) duty drive and
> the fact, that they have serviced now more than 25000 hours, it would
> be wise to replace them with alternatives. 

I think someone had an apm command that turns off the head park,
that would do wonders for drive life.   On the other hand, I think
if it was my data and I saw that the drive had 2M head load cycles
I would be looking to get out of that driv with any data I could
not easily replace.  If it was well backed up or easily replaced
my worries would be less.

... 275 lines removes ...
-- 
Rod Grimes                                                 rgrimes_at_freebsd.org