On 12 Oct 2005 at 19:58, Mike Tancsa wrote: > At 05:48 PM 12/10/2005, Dan Langille wrote: > >I'm seeing these errors but I do not know if it's an HDD problem > >or an OS problem. Clues please? > > They look like hard errors, but I have seen similar problems with bad > drive trays. smartmontools out of the ports will help you narrow it > down. (eg check the output of smartctl -a /dev/ad0). We did that yesterday. I don't know enough about the output to judge, but it seems ok. Also posted to http://pastebin.com/391872 [root_at_mtwenty:/usr/ports/sysutils/smartmontools] # smartctl -a /dev/ad0 smartctl version 5.33 [i386-portbld-freebsd6.0] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: Maxtor 6Y080L0 Serial Number: Y3KLWA7E Firmware Version: YAR41BW0 User Capacity: 81,964,302,336 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 Local Time is: Tue Oct 11 08:45:22 2005 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 182) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 40) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 3 Spin_Up_Time 0x0027 200 200 063 Pre-fail Always - 16714 4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 77 5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 0 6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0 7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0 8 Seek_Time_Performance 0x0027 251 247 187 Pre-fail Always - 36405 9 Power_On_Minutes 0x0032 243 243 000 Old_age Always - 317h+56m 10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail Always - 0 11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 253 253 000 Old_age Always - 84 192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 36 195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 3036 196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline - 0 197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 0 198 Offline_Uncorrectable 0x0008 253 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0008 198 196 000 Old_age Offline - 4 200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 4 202 TA_Increase_Count 0x000a 253 252 000 Old_age Always - 0 203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 0 204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age Always - 0 205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always - 0 207 Spin_High_Current 0x002a 253 252 000 Old_age Always - 0 208 Spin_Buzz 0x002a 253 252 000 Old_age Always - 0 209 Offline_Seek_Performnce 0x0024 198 198 000 Old_age Offline - 0 99 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 100 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 101 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 SMART Error Log Version: 1 ATA Error Count: 4 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 4 occurred at disk power-on lifetime: 3332 hours (138 days + 20 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 38 1f a2 e0 Error: ICRC, ABRT at LBA = 0x00a21f38 = 10624824 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 37 38 1f a2 e0 08 00:06:15.120 READ DMA c8 00 09 2f 1f a2 e0 08 00:06:15.120 READ DMA c8 00 36 f9 1e a2 e0 08 00:06:15.120 READ DMA c8 00 0a ef 1e a2 e0 08 00:06:15.120 READ DMA c8 00 35 ba 1e a2 e0 08 00:06:15.120 READ DMA Error 3 occurred at disk power-on lifetime: 3332 hours (138 days + 20 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 ba 1e a2 e0 Error: ICRC, ABRT at LBA = 0x00a21eba = 10624698 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 35 ba 1e a2 e0 08 00:06:15.056 READ DMA c8 00 0b af 1e a2 e0 08 00:06:15.056 READ DMA c8 00 34 7b 1e a2 e0 08 00:06:15.056 READ DMA c8 00 0c 6f 1e a2 e0 08 00:06:15.056 READ DMA c8 00 02 6f 1e a2 e0 08 00:06:15.056 READ DMA Error 2 occurred at disk power-on lifetime: 3332 hours (138 days + 20 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 c1 aa a2 e0 Error: ICRC, ABRT at LBA = 0x00a2aac1 = 10660545 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 3e c1 aa a2 e0 08 00:06:14.880 READ DMA c8 00 02 bf aa a2 e0 08 00:06:14.880 READ DMA c8 00 34 0b 3b 53 e0 08 00:06:14.880 READ DMA c8 00 0c ff 3a 53 e0 08 00:06:14.880 READ DMA c8 00 01 7e 00 00 e0 08 00:06:14.880 READ DMA Error 1 occurred at disk power-on lifetime: 3332 hours (138 days + 20 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 79 96 0e e0 Error: ICRC, ABRT at LBA = 0x000e9679 = 956025 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 16 79 96 0e e0 08 00:06:14.736 READ DMA c8 00 2a 4f 96 0e e0 08 00:06:14.736 READ DMA c8 00 02 33 54 53 e0 08 00:06:14.736 READ DMA c8 00 08 f7 aa a2 e0 08 00:06:14.736 READ DMA c8 00 08 f7 aa a2 e0 08 00:06:14.736 READ DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 3370 - # 2 Short offline Completed without error 00% 7 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. [root_at_mtwenty:/usr/ports/sysutils/smartmontools] # > > ---Mike > > > >The following was also posted at http://pastebin.com/391670 > > > >Oct 11 03:40:00 mtwenty kernel: ad0: FAILURE - READ_DMA > >status=7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> > >error=7f<UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA > >,ILLEGAL_LENGTH> LBA=802719 > >Oct 11 03:40:00 mtwenty kernel: > >g_vfs_done():ad0s1a[READ(offset=410959872, length=16384)]error = 5 > >Oct 11 03:40:06 mtwenty kernel: ad0: FAILURE - READ_DMA > >status=7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> > >error=7f<UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA > >,ILLEGAL_LENGTH> LBA=802175 > >Oct 11 03:40:06 mtwenty kernel: > >g_vfs_done():ad0s1a[READ(offset=410681344, length=8192)]error = 5 > >Oct 11 03:40:06 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 > >retry left) LBA=4857391 > >Oct 11 03:40:01 mtwenty cron[82160]: login_getclass: retrieving > >class information: Input/output error > >Oct 11 03:44:49 mtwenty kernel: ad0: FAILURE - READ_DMA > >status=7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> > >error=7f<UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA > >,ILLEGAL_LENGTH> LBA=151787983 > >Oct 11 03:44:49 mtwenty kernel: > >g_vfs_done():ad0s1f[READ(offset=74097885184, length=14336)]error = 5 > >Oct 11 03:44:56 mtwenty kernel: ad0: FAILURE - WRITE_DMA > >status=7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> > >error=7f<UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDI > >A,ILLEGAL_LENGTH> LBA=4857391 > >Oct 11 03:44:56 mtwenty kernel: > >g_vfs_done():ad0s1d[WRITE(offset=969719808, length=10240)]error = 5 > >Oct 11 03:44:56 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 > >retry left) LBA=92997387 > >Oct 11 03:55:07 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 > >retry left) LBA=4092687 > >Oct 11 13:04:08 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 > >retry left) LBA=4092687 > >Oct 11 13:52:08 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 > >retry left) LBA=4092687 > >Oct 11 13:55:07 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 > >retry left) LBA=4092687 > >Oct 11 13:55:33 mtwenty kernel: ad0: timeout waiting to issue command > >Oct 11 13:55:33 mtwenty kernel: ad0: error issueing WRITE_DMA command > >Oct 11 13:55:33 mtwenty kernel: ad0: timeout waiting to issue command > >Oct 11 13:55:33 mtwenty kernel: ad0: error issueing WRITE_DMA command > >Oct 11 13:55:33 mtwenty kernel: ad0: timeout waiting to issue command > >Oct 11 13:55:33 mtwenty kernel: ad0: error issueing WRITE_DMA command > >Oct 11 13:55:33 mtwenty kernel: ad0: timeout waiting to issue command > >Oct 11 13:55:33 mtwenty kernel: ad0: error issueing WRITE_DMA command > >Oct 11 13:55:33 mtwenty kernel: > >g_vfs_done():ad0s1f[WRITE(offset=42777804800, length=16384)]error = 5 > >Oct 11 13:55:33 mtwenty kernel: > >g_vfs_done():ad0s1f[WRITE(offset=43163189248, length=16384)]error = 5 > >Oct 11 13:55:33 mtwenty kernel: > >g_vfs_done():ad0s1a[WRITE(offset=131072, length=16384)]error = 5 > >Oct 11 13:55:33 mtwenty kernel: > >g_vfs_done():ad0s1a[WRITE(offset=147456, length=16384)]error = 5 > >Oct 11 13:55:38 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 > >retry left) LBA=786815 > >Oct 11 15:44:31 mtwenty shutdown: reboot by dan: > > > >Oct 11 16:13:03 mtwenty su: dan to root on /dev/ttyp1 > >Oct 11 19:51:04 mtwenty kernel: ad0: timeout waiting to issue command > >Oct 11 19:51:09 mtwenty kernel: ad0: error issueing WRITE_DMA command > >Oct 11 19:51:09 mtwenty kernel: ad0: timeout waiting to issue command > >Oct 11 19:51:09 mtwenty kernel: ad0: error issueing WRITE_DMA command > >Oct 11 19:51:09 mtwenty kernel: > >g_vfs_done():ad0s1f[WRITE(offset=49576368128, length=2048)]error = 5 > >Oct 11 19:51:09 mtwenty kernel: > >g_vfs_done():ad0s1f[WRITE(offset=49767104512, length=16384)]error = 5 > >Oct 11 19:51:09 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 > >retry left) LBA=104266895 > >Oct 11 20:17:45 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 > >retry left) LBA=319 > >Oct 12 17:23:37 mtwenty syslogd: kernel boot file is /boot/kernel/kernel > >Oct 12 17:23:37 mtwenty kernel: > >g_vfs_done():ad0s1d[WRITE(offset=969867264, length=8192)]error = 6 > >Oct 12 17:23:37 mtwenty kernel: > >g_vfs_done():ad0s1d[WRITE(offset=963559424, length=16384)]error = 6 > >Oct 12 17:23:37 mtwenty kernel: unknown: TIMEOUT - READ_DMA retrying > >(0 retries left) LBA=153118463 > >Oct 12 17:23:37 mtwenty kernel: unknown: FAILURE - READ_DMA timed > >out LBA=153118463 > >Oct 12 17:23:37 mtwenty kernel: > >g_vfs_done():ad0s1f[READ(offset=74779090944, length=2048)]error = 5 > >Oct 12 17:23:37 mtwenty kernel: > >g_vfs_done():ad0s1f[READ(offset=74779097088, length=2048)]error = 6 > >Oct 12 17:23:37 mtwenty kernel: > >g_vfs_done():ad0s1f[READ(offset=74202345472, length=2048)]error = 6 > >Oct 12 17:23:37 mtwenty kernel: > >g_vfs_done():ad0s1f[READ(offset=75589498880, length=2048)]error = 6 > > > >Thanks > >-- > >Dan Langille : http://www.langille.org/ > >BSDCan - The Technical BSD Conference - http://www.bsdcan.org/ > > > > > >_______________________________________________ > >freebsd-current_at_freebsd.org mailing list > >http://lists.freebsd.org/mailman/listinfo/freebsd-current > >To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org" > > -- Dan Langille : http://www.langille.org/ BSDCan - The Technical BSD Conference - http://www.bsdcan.org/Received on Thu Oct 13 2005 - 00:55:26 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:45 UTC