Re: Apparently spurious ZFS CRC errors (was Re: ZFS data error without reasons)

From: Mark Powell <M.S.Powell_at_salford.ac.uk>
Date: Tue, 7 Apr 2009 14:33:14 +0100 (BST)
On Thu, 26 Mar 2009, Mark Powell wrote:

> On Wed, 25 Mar 2009, Bernd Walter wrote:
>> I don't know if it is with the drives, but other reasons are less
>> likely in my opinion.
>> The system is located in a data center and since I only get a few errors
>> I decided to live with it and not to debug it further.
>
> I've decided to split my drives in two pools; 5x500GB RAIDZ1 of WD5000AAKS 
> and the 6x1TB RAIDZ2 of WD10EADS. I'll see if they perform differently. I'm 
> using the defaults of WC on, with all ZFS options enabled.

Ok. I've been running with this config for 13 days now. During that time 
no CRC errors at all have been found on either pool. I have been scrubbing 
both pools together at 2am, hoping the simultaneous IO would cause some 
kind of hardware strain.
   There were again no CRC errors found in the scrub which occured at 2am 
today.
   However, after a few hours I see CRC errors appeared on both pools. 
Curiously CRC errors on both pools appeared at the same time. I've been 
running zpool status from cron every minute and all these new CRC errors, 
occured within two consecutive minutes:

-----
# zpool status
   pool: pool
  state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
         attempt was made to correct the error.  Applications are 
unaffected.
action: Determine if the device needs to be replaced, and clear the errors
         using 'zpool clear' or replace the device with 'zpool replace'.
    see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: scrub in progress for 0h11m, 6.16% done, 2h53m to go
config:

         NAME             STATE     READ WRITE CKSUM
         pool             ONLINE       0     0     0
           raidz1         ONLINE       0     0     0
             stripe/str0  ONLINE       0     0     0
             ad8          ONLINE       0     0     0
             ad10         ONLINE       0     0     0
             ad12         ONLINE       0     0     0
             ad14         ONLINE       0     0     1

errors: No known data errors

   pool: pool2
  state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
         attempt was made to correct the error.  Applications are 
unaffected.
action: Determine if the device needs to be replaced, and clear the errors
         using 'zpool clear' or replace the device with 'zpool replace'.
    see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: scrub in progress for 0h11m, 2.82% done, 6h29m to go
config:

         NAME        STATE     READ WRITE CKSUM
         pool2       ONLINE       0     0     0
           raidz2    ONLINE       0     0     0
             ad18    ONLINE       0     0     0
             ad20    ONLINE       0     0     4
             ad22    ONLINE       0     0     2
             ad24    ONLINE       0     0     0
             ad26    ONLINE       0     0     0
             ad28    ONLINE       0     0     6

errors: No known data errors
-----

Is the opinion that this is still the drives?
   Cheers.

-- 
Mark Powell - UNIX System Administrator - The University of Salford
Information & Learning Services, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 6843  Fax: +44 161 295 5888  www.pgp.com for PGP key
Received on Tue Apr 07 2009 - 11:33:18 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:45 UTC