Ion-Mihai, For more information on smartmontools (smartctl,smartd), check out the Source Forge site, http://smartmontools.sourceforge.net If you have specific questions, you can email the support list (link on the page above). Ed On Mon, 2004-10-11 at 07:09, Ion-Mihai Tetcu wrote: > [ please reply only on questions_at_ if this is not appropriate for current_at_ ] > > Hi, > > While doing nothing special the system start printing TIMEOUT - > WRITE_DMA erros and eventually after an atacontrol mode 0 PIO4 PIO4 > hanged completely at 04:20. > > After restart I've got a few TIMEOUT .. but no hung, however the machine > is idle. > > SMART was enabled as seen bellow, but smartd wasn't running (stupid, huh > :-/ ). > > Obvious question: is the hdd dying ? > > Second question, as I'm not familiar with SMART: how much can one trust > SMART reports ? > > Third question: could you suggest some settings for smartd ? I'm, asking > this because I don't fully understand the man pages for smartctl and > smartd; a link explaining more about smart would also be appreciated. > > > System details: > > Local system status (last daily mail): > 3:01AM up 2 days, 11:56, 2 users, load averages: 1.04, 1.07, 0.95 > > % uname -a > FreeBSD it.buh.cameradicommercio.ro 5.3-BETA7 FreeBSD 5.3-BETA7 #3: Mon Oct 4 21:57:25 EEST 2004 root_at_it.buh.tecnik93.com:/usr/obj/usr/src/sys/IT53_d i386 > > Oct 11 04:06:51 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=186210020 > Oct 11 04:07:02 it kernel: ata0: reiniting channel .. > Oct 11 04:07:02 it kernel: ata0: reset tp1 mask=03 ostat0=d0 ostat1=d0 > Oct 11 04:07:02 it kernel: ad0: stat=0xd0 err=0xd0 lsb=0xd0 msb=0xd0 > Oct 11 04:07:02 it last message repeated 95 times > Oct 11 04:07:02 it kernel: ad0: stat=0x50 err=0x01 lsb=0x00 msb=0x00 > Oct 11 04:07:02 it kernel: ata0-slave: stat=0x00 err=0x01 lsb=0x00 msb=0x00 > Oct 11 04:07:02 it kernel: ata0: reset tp2 stat0=50 stat1=00 devices=0x1<ATA_MASTER> > Oct 11 04:07:02 it kernel: ata0: resetting done .. > Oct 11 04:07:02 it kernel: ad0: pio=0x0c wdma=0x22 udma=0x45 cable=80pin > Oct 11 04:07:02 it kernel: ad0: setting PIO4 on VIA 8235 chip > Oct 11 04:07:02 it kernel: ad0: setting UDMA100 on VIA 8235 chip > Oct 11 04:07:02 it kernel: ata0: device config done .. > Oct 11 04:07:16 it kernel: (probe0:ata0:0:0:0): error 22 > Oct 11 04:07:16 it kernel: (probe0:ata0:0:0:0): Unretryable Error > Oct 11 04:07:16 it kernel: (probe1:ata0:0:1:0): error 22 > Oct 11 04:07:16 it kernel: (probe1:ata0:0:1:0): Unretryable Error > ......... > > # grep LBA /var/log/messages > Oct 11 04:06:51 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=186210020 > Oct 11 04:07:52 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=165839908 > Oct 11 04:08:48 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=165849220 > Oct 11 04:09:12 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=165851556 > Oct 11 04:09:32 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=165859748 > Oct 11 04:10:44 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=6343103 > Oct 11 04:11:23 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=186210916 > Oct 11 04:11:36 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=186211044 > Oct 11 04:11:58 it kernel: acd0: FAILURE - ATA_IDENTIFY status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=0 > Oct 11 04:13:21 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=309294340 > Oct 11 04:14:00 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=175421156 > Oct 11 04:14:24 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=175421156 > Oct 11 04:15:04 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=175421796 > Oct 11 04:15:48 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=130261540 > Oct 11 04:16:10 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=175421892 > Oct 11 04:16:53 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=173918724 > Oct 11 04:18:50 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=309924420 > Oct 11 04:19:14 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=4920283 > Oct 11 04:40:00 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=4918975 > Oct 11 04:40:56 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=6067199 > Oct 11 10:46:52 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=6343103 > > # grep sw /var/log/messages > Oct 11 04:14:24 it kernel: swap_pager: indefinite wait buffer: device: ad0s1e, blkno: 14841, size: 4096 > Oct 11 04:14:24 it kernel: swap_pager: indefinite wait buffer: device: ad0s3d, blkno: 14381, size: 4096 > Oct 11 04:16:53 it kernel: swap_pager: indefinite wait buffer: device: ad0s3d, blkno: 60732, size: 4096 > Oct 11 04:16:53 it kernel: swap_pager: indefinite wait buffer: device: ad0s3d, blkno: 33481, size: 4096 > Oct 11 04:16:53 it kernel: swap_pager: indefinite wait buffer: device: ad0s3d, blkno: 33488, size: 4096 > > > > The disk is: > # atacontrol cap 0 0 > ATA channel 0, Master, device ad0: > > Protocol ATA/ATAPI revision 6 > device model WDC WD1600JB-00EVA0 > serial number WD-WCAEK1298992 > firmware revision 15.05R15 > cylinders 16383 > heads 16 > sectors/track 63 > lba supported 268435455 sectors > lba48 supported 312579695 sectors > dma supported > overlap not supported > > Feature Support Enable Value Vendor > write cache yes no > read ahead yes yes > dma queued no no 0/0x00 > SMART yes yes > microcode download yes yes > security yes no > power management yes yes > advanced power management no no 0/0x00 > automatic acoustic management yes yes 254/0xFE 128/0x80 > > # smartctl -a /dev/ad0 > smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > === START OF INFORMATION SECTION === > Device Model: WDC WD1600JB-00EVA0 > Serial Number: WD-WCAEK1298992 > Firmware Version: 15.05R15 > Device is: In smartctl database [for details use: -P show] > ATA Version is: 6 > ATA Standard is: Exact ATA specification draft version not indicated > Local Time is: Mon Oct 11 12:37:32 2004 EEST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > The SMART RETURN STATUS return value (smartmontools -H option/Directive) > can not be retrieved with this version of ATAng, please do not rely on this value > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x05) Offline data collection activity > was aborted by an interrupting command from host. > Auto Offline Data Collection: Disabled. > Self-test execution status: ( 40) The self-test routine was interrupted > by the host with a hard or soft reset. > Total time to complete Offline > data collection: (5061) seconds. > Offline data collection > capabilities: (0x79) SMART execute Offline immediate. > No Auto Offline data collection support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > No General Purpose Logging support. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 67) minutes. > Conveyance self-test routine > recommended polling time: ( 5) minutes. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0 > 3 Spin_Up_Time 0x0007 155 147 021 Pre-fail Always - 2775 > 4 Start_Stop_Count 0x0032 100 100 040 Old_age Always - 464 > 5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 8 > 7 Seek_Error_Rate 0x000b 200 199 051 Pre-fail Always - 0 > 9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3360 > 10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0 > 11 Calibration_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 462 > 194 Temperature_Celsius 0x0022 124 253 000 Old_age Always - 26 > 196 Reallocated_Event_Count 0x0032 194 194 000 Old_age Always - 6 > 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0 > 198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 0 > 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 2 > 200 Multi_Zone_Error_Rate 0x0009 200 155 051 Pre-fail Offline - 0 > > SMART Error Log Version: 1 > No Errors Logged > > SMART Self-test log structure revision number 1 > Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error > # 1 Extended captive Interrupted (host reset) 80% 77 - > # 2 Extended offline Aborted by host 90% 77 - > # 3 Conveyance offline Completed without error 00% 76 - > # 4 Short offline Completed without error 00% 76 - > # 5 Conveyance offline Completed without error 00% 233 - > # 6 Short captive Interrupted (host reset) 90% 233 - > > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 0 0 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 minute delay. > > > Thanks, -- Eduard Martinescu <martines_at_rochester.rr.com>Received on Mon Oct 11 2004 - 10:19:15 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:16 UTC