RE: 8.0-RELEASE: disk IO temporarily hangs up (ZFS or ATA related problem)

From: Alexander Zagrebin <alexz_at_visp.ru>
Date: Fri, 18 Dec 2009 16:31:34 +0300
> > May be a problem is in ata? WD15EADS is a "green" series of drives.
> 
> If you don't receive TIMEOUT messages, then commands probably complete
> successfully. Loads above 100% may mean that requests are running for
> few seconds. It shouldn't happen normally.

All right. There are no errors, but disk speed is extremely low (just
several kb/s).
This state suddenly comes, and suddenly ceases (after approx. 10 min).
During this state gstat shows something like this:

dT: 1.008s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0      0      0      0    0.0      0      0    0.0    0.0| ad6
    3      2      1    127   5377      1      4   1711  179.1| ad6
    0      0      0      0    0.0      0      0    0.0    0.0| ad6
    1     16      0      0    0.0     16   2038  111.6  177.6| ad6
and so on.

> Have you tried to check whole drive surface? Have you checked SMART
> reports? I had a drive onece, which had so many media problems and
> relocated sectors, that I was unable to burn CD from it, so 
> slow it was.
> Same time there was no bad sectors on that drive, reported to 
> OS, drive
> handled everything. It was just extremely slow.

# smartctl -A /dev/ad6
smartctl version 5.38 [amd64-portbld-freebsd8.0] Copyright (C) 2002-8 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always
-       0
  3 Spin_Up_Time            0x0027   180   180   021    Pre-fail  Always
-       5958
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always
-       23
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always
-       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always
-       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always
-       158
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always
-       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always
-       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
-       18
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always
-       17
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always
-       3886
194 Temperature_Celsius     0x0022   115   106   000    Old_age   Always
-       35
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always
-       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always
-       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline
-       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always
-       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline
-       0

> Have you tried to rewrite _while_ drive media with dd? Some 
> media errors
> could only be resolved only during write operation.

I have 2GB swap partition (ad6p2) on this drive.
While main zfs partition (ad6p3) is offline I have tried:
1. several times completely to rewrite/read with dd.
   No errors and delays.
2. to create UFS filesystem and to use it.
   No errors and delays (although may be more testing need).
3. to create ZFS pool (`zpool create test /dev/ad6p2` (not mirrored)) and to
use it. 
   Oops! We have the problem!
   After the problem has appeared, the any operation (dd, for example) on
ad6
   is very slow with same symptoms. 

At this point i have questions only...

What can I do? 

-- 
Alexander Zagrebin
Received on Fri Dec 18 2009 - 12:31:37 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:59 UTC