Re: Another ZFS kernel panic on same block on every drive in raidz

From: Mark Powell <M.S.Powell_at_salford.ac.uk>
Date: Thu, 30 Aug 2007 19:13:25 +0100 (BST)
On Thu, 30 Aug 2007, Mark Powell wrote:

>  I am being told that a dma error is occuring on the same block on all 3 
> drives at the same time:
>
>  Just performing a scrub now to see what happens.

The scrub performed fine.
   The panic is occuring under heavyish use; with 3 simultaneous rsync from 
an XP box over samba.
   Just recalled that it paniced earlier, but I was in X and couldn't see 
the message. Surprisingly it did log something:

Aug 30 17:27:48 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435298
Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435297
Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435297
Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435298
Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435297
Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435297
Aug 30 17:28:29 echo kernel: ad14: FAILURE - WRITE_DMA timed out LBA=268435298
Aug 30 17:28:29 echo kernel: ad18: FAILURE - WRITE_DMA timed out LBA=268435297
Aug 30 17:28:29 echo kernel: ad16: FAILURE - WRITE_DMA timed out LBA=268435297
Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435297
Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435298
Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435297
Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435297
Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435298
Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435297
Aug 30 17:28:29 echo kernel: ad18: FAILURE - WRITE_DMA timed out LBA=268435297
Aug 30 17:28:29 echo kernel: ad14: FAILURE - WRITE_DMA timed out LBA=268435298
Aug 30 17:28:29 echo kernel: ad16: FAILURE - WRITE_DMA timed out LBA=268435297
Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435425
Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435426
Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435425
Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435425
Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435426
Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435425
Aug 30 17:28:29 echo kernel: ad18: FAILURE - WRITE_DMA timed out LBA=268435425
Aug 30 17:28:29 echo kernel: ad14: FAILURE - WRITE_DMA timed out LBA=268435426

Here the blocks are different and 4 blocks overall are reported as having 
problems. In hex they all start FFFFFxx ? They are (including the one from 
the previous report):

268435297	fffff61
268435298	fffff62

268435340	fffff8c

268435425	fffffe1
268435426	fffffe2

Coincidence?
   This is on amd64 with all drives connected to the ICH9 ports on a 
Gigabyte Intel P35 based MB.
   Current is from 25/8/7.
   Cheers.

-- 
Mark Powell - UNIX System Administrator - The University of Salford
Information Services Division, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 4837  Fax: +44 161 295 5888  www.pgp.com for PGP key
Received on Thu Aug 30 2007 - 16:13:48 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:17 UTC