Re: ata timeouts under load

From: Sean C. Farley <scf_at_FreeBSD.org>
Date: Mon, 14 Sep 2009 11:51:39 -0500 (CDT)
On Mon, 14 Sep 2009, Mike Tancsa wrote:

> At 11:21 AM 9/14/2009, Miroslav Lachman wrote:
>
>> I have very similar problem with one disk in gmirror, but it is on 7.2 
>> not current.
>
>> Sep 14 04:48:29 jimi kernel: ad6: timeout waiting to issue command
>> Sep 14 04:48:29 jimi kernel: ad6: error issuing FLUSHCACHE command
>> Sep 14 04:48:29 jimi kernel: ad6: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=447001516
>> Sep 14 04:48:29 jimi kernel: ad6: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=447001516
>
> Are you sure this is not just a bad cable ? I have had similar symptoms 
> which was a result of a bad cable.  If possible, swap the cable between 
> the 2 disks and see if it follows the cable.

I also have the same/similar problem with 7.2 (and earlier).  I have 
replaced the cable and the drive.  Replacing the drive resulted in the LBA 
changing, but otherwise the LBA never changes.  Extended offline tests complete without 
errors.

Timeout message:
kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=43471743

I do use this in /boot/loader.conf to help (I hope) prevent the timeout 
from breaking the mirror:
kern.geom.mirror.timeout=45

Reading that region with dd does not produce the timeout, but it may be 
because of this just noticed error:

Error 9 occurred at disk power-on lifetime: 13578 hours (565 days + 18 hours)
   When the command that caused the error occurred, the device was in an unknown state.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 59 11 8e 53 97 e2  Error: UNC 17 sectors at LBA = 0x0297538e = 43471758

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 02 20 7f 53 97 e2 97      00:04:48.074  READ DMA
   c8 02 20 5f 53 97 e2 97      00:04:48.062  READ DMA
   c8 02 20 3f 53 97 e2 97      00:04:48.050  READ DMA
   c8 02 04 43 6e c5 e2 c5      00:04:48.029  READ DMA
   c8 02 20 ff d6 8b e2 8b      00:04:48.016  READ DMA

Would this error mean that the drive has remapped the block?  However, 
remapping should only occur when the block has a write operation applied 
to it, yes?  Is there a safe way of writing to a specific block?  Would it 
be safe to read a block with dd and write it back?  Of course, the drive 
would not be in the mirror at the time.

Sean
-- 
scf_at_FreeBSD.org
Received on Mon Sep 14 2009 - 14:51:42 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:55 UTC