Sata controller headache

From: Paul Bliss <pbliss_at_mechno.com>
Date: Thu, 28 Sep 2006 10:04:16 -0400 (EDT)
  Hello all,

I recently rebuilt my server and made a few upgrades.
I'm running 6.1 RELEASE and I've been having very annoying crashes that I 
think are related to my SATA Controller.

I'm using the Promise FastTrak TX2300 controller with a Western 
Digital WD2500KS.

The problem I'm having is that when I execute a command that hits the disc 
too hard, such an "ls -laR /foo"

/var/log messages is giving me errors that look like this:

Sep 28 01:23:19 helix kernel: ad4: TIMEOUT - READ_DMA48 retrying (1 retry 
left) LBA=310114239
Sep 28 01:23:19 helix kernel: ad4: WARNING - READ_DMA48 UDMA ICRC error 
(retrying request) LBA=310114239
Sep 28 01:23:28 helix kernel: ad4: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Sep 28 01:23:44 helix kernel: ad4: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Sep 28 01:23:44 helix kernel: ad4: WARNING - SETFEATURES ENABLE RCACHE 
taskqueue timeout - completing request directly
Sep 28 01:23:44 helix kernel: ad4: WARNING - SETFEATURES ENABLE WCACHE 
taskqueue timeout - completing request directly
Sep 28 01:23:44 helix kernel: ad4: WARNING - SET_MULTI taskqueue timeout - 
completing request directly
Sep 28 01:23:44 helix kernel: ad4: FAILURE - READ_DMA48 timed out 
LBA=310114239
Sep 28 01:23:44 helix kernel: g_vfs_done():ad4s1f[READ(offset=42109632512, 
length=131072)]error = 5


Has anyone else seen this? Any suggestions? I'm tempted to use atacontrol 
to change the mode, but I figured I'd ask the list first.

TIA for the help!
-Paul


P.S.


Output of "smartctl -a /dev/ad4/
smartctl version 5.36 [i386-portbld-freebsd6.1] Copyright (C) 2002-6 Bruce 
Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD2500KS-00MJB0
Serial Number:    WD-WCANK3148169
Firmware Version: 02.01C03
User Capacity:    250,059,350,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Sep 28 09:58:50 2006 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                         was suspended by an interrupting 
command from host.
                                         Auto Offline Data Collection: 
Enabled.
Self-test execution status:      (   0) The previous self-test routine 
completed
                                         without error or no self-test has 
ever
                                         been run.
Total time to complete Offline
data collection:                 (7080) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                         Auto Offline data collection 
on/off support.
                                         Suspend Offline collection upon 
new
                                         command.
                                         Offline surface scan supported.
                                         Self-test supported.
                                         Conveyance Self-test supported.
                                         Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                         power-saving mode.
                                         Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                         General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  83) minutes.
Conveyance self-test routine
recommended polling time:        (   6) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED 
WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always 
-       0
   3 Spin_Up_Time            0x0003   217   217   021    Pre-fail  Always 
-       4125
   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always 
-       29
   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always 
-       0
   7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  Always 
-       0
   9 Power_On_Hours          0x0032   093   093   000    Old_age   Always 
-       5117
  10 Spin_Retry_Count        0x0013   100   253   051    Pre-fail  Always 
-       0
  11 Calibration_Retry_Count 0x0012   100   253   051    Old_age   Always 
-       0
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always 
-       28
190 Unknown_Attribute       0x0022   038   026   045    Old_age   Always 
FAILING_NOW 62
194 Temperature_Celsius     0x0022   088   076   000    Old_age   Always 
-       62
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always 
-       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always 
-       0
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline 
-       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always 
-       9
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline 
-       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute 
delay.
Received on Thu Sep 28 2006 - 12:04:34 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:00 UTC