Re: ata timeouts under load

From: Andriy Gapon <avg_at_icyb.net.ua>
Date: Mon, 14 Sep 2009 13:44:19 +0300
on 14/09/2009 00:02 Kris Kennaway said the following:
> 
> It's always that sequence (with setfeatures timing out first, then the
> dma later)...and the block number varies widely, also whether it's
> read/write.  The disk itself & the data it contains appears to be OK as
> far as I have been able to determine so far.

I also sometimes see something similar when I put very high load with a specific
pattern on my two-disk mirrored zpool.
The pattern is zpool scrub plus additional load like untarring large archives.

Example:
kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -
completing request directly
kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -
completing request directly
kernel: ad10: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing
request directly
kernel: ad10: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing
request directly
kernel: ad10: WARNING - SET_MULTI taskqueue timeout - completing request directly
kernel: ad10: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=568158815
kernel: ad10: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=568159071
kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -
completing request directly
kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -
completing request directly
kernel: ad10: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing
request directly
kernel: ad10: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing
request directly
kernel: ad10: WARNING - SET_MULTI taskqueue timeout - completing request directly
kernel: ad10: TIMEOUT - READ_DMA48 retrying (0 retries left) LBA=568158815
kernel: ad10: TIMEOUT - READ_DMA48 retrying (0 retries left) LBA=568159071
kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -
completing request directly
kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -
completing request directly
kernel: ad10: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing
request directly
kernel: ad10: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing
request directly
kernel: ad10: WARNING - SET_MULTI taskqueue timeout - completing request directly
kernel: ad10: FAILURE - READ_DMA48 timed out LBA=568158815
kernel: ad10: FAILURE - READ_DMA48 timed out LBA=568159071
root: ZFS: vdev I/O failure, zpool=tank path=/dev/ad10s2d offset=284457041920
size=131072 error=5
root: ZFS: vdev I/O failure, zpool=tank path=/dev/ad10s2d offset=284456910848
size=131072 error=5


But I also see cases where dma timeout message appears first:
ad10: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE>
LBA=568157535
ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad10: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request
directly
ad10: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request
directly
ad10: WARNING - SET_MULTI taskqueue timeout - completing request directly
ad10: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=568158559
ad10: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=568158815

No errors happen whatsoever if I run scrub without any additional load, or if I do
any 'typical' disk loads without parallel scrubbing.

-- 
Andriy Gapon
Received on Mon Sep 14 2009 - 08:44:23 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:55 UTC