Alexandre Sunny wrote: > On Sun, 13 Sep 2009 22:02:10 +0100 > Kris Kennaway <kris_at_FreeBSD.org> wrote: > > >>Alexander Motin wrote: >> >>>Kris Kennaway wrote: [...] >>>There are two different kinds of timeouts we can see: >>> - first one, "ad4: WARNING - ..." is just a queue waiting timeout. >>>It is not the reason, but consequence of the problem. And I have >>>doubts that it is reasonable to do it. >>> - second one, "TIMEOUT - WRITE_DMA48 ..." is a real command >>>execution timeout. I don't know whether this is result of some >>>improper error recovery, or you drive indeed lost required servo >>>information near LBA=344052040 and tries to find it too long. You >>>can try to read that sector and nearby ones with dd. >>> >> >>It's always that sequence (with setfeatures timing out first, then >>the dma later)...and the block number varies widely, also whether >>it's read/write. The disk itself & the data it contains appears to >>be OK as far as I have been able to determine so far. > > > Does smartctl -A /dev/ad4 report "Seek Error Rate" and/or "ECC Error > Rate", and, if so, do those values change while errors are being > reported? > > "Replaced Sector Count" or something similar might give some insight > too. I have very similar problem with one disk in gmirror, but it is on 7.2 not current. Sep 14 04:48:29 jimi kernel: ad6: timeout waiting to issue command Sep 14 04:48:29 jimi kernel: ad6: error issuing FLUSHCACHE command Sep 14 04:48:29 jimi kernel: ad6: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=447001516 Sep 14 04:48:29 jimi kernel: ad6: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=447001516 Sep 14 04:48:29 jimi kernel: GEOM_MIRROR: Request failed (error=5). ad6[READ(offset=228864776192, length=2048)] Sep 14 04:48:29 jimi kernel: GEOM_MIRROR: Device gm0: provider ad6 disconnected. But no errors in SMART log: Device Model: Hitachi HDP725050GLA360 Firmware Version: GM4OA52A User Capacity: 500,107,862,016 bytes SMART overall-health self-assessment test result: PASSED SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 1 2 Throughput_Performance 0x0005 130 130 054 Pre-fail Offline - 151 3 Spin_Up_Time 0x0007 116 116 024 Pre-fail Always - 312 (Average 350) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 23 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 129 129 020 Pre-fail Offline - 30 9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 13911 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 23 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 545 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 545 194 Temperature_Celsius 0x0002 240 240 000 Old_age Always - 25 (Lifetime Min/Max 20/34) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 As it was discussed many times - it should be fixed by increasing the hardcoded timouts. Is it time to make the ATA timeout sysctl tunables? There were patches from FreeNAS and some PRs about longer timeouts. kern/136182: [ata] Heavy disk writes (e.g. ZFS resilver to a drive) can cause "adX: TIMEOUT - FLUSHCACHE retrying (1 retry left)" on console. http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/136182 kern/111023: [ata] [request] [patch] please expand ata timeouts http://www.freebsd.org/cgi/query-pr.cgi?pr=111023 ATA/SATA DMA timeout issues http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting#line-53 HowTo: Fix SATA DMA timeout issues on FreeBSD http://linux-bsd-sharing.blogspot.com/2009/03/howto-fix-sata-dma-timeout-issues-on.html Western Digital hard disks and ATA timeouts http://www.mail-archive.com/freebsd-hardware_at_freebsd.org/msg03135.html ata FLUSHCACHE timeout errors? [patch] http://lists.freebsd.org/pipermail/freebsd-current/2009-April/005939.html And I am sure, you can find many more reports floating around. Miroslav LachmanReceived on Mon Sep 14 2009 - 13:21:42 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:55 UTC