strange deadlock and magic resurrection with RELENG_6 (fwd)

From: Michael Reifenberger <mike_at_Reifenberger.com>
Date: Sun, 26 Mar 2006 10:43:45 +0200 (CEST)
Hi,
has anyone seen the failure bolow?
Is the failure present on -current too?

Bye/2
---
Michael Reifenberger, Business Development Manager SAP-Basis, Plaut Consulting
Comp: Michael.Reifenberger_at_plaut.de | Priv: Michael_at_Reifenberger.com
       http://www.plaut.de           |       http://www.Reifenberger.com


---------- Forwarded message ----------
Date: Fri, 24 Mar 2006 15:42:25 +0100 (CET)
From: Michael Reifenberger <mike_at_reifenberger.com>
To: pjd_at_freebsd.org
Subject: strange deadlock and magic resurrection with RELENG_6 (fwd)

Hi,
any clues about the issue below?

Bye/2
---
Michael Reifenberger, Business Development Manager SAP-Basis, Plaut Consulting
Comp: Michael.Reifenberger_at_plaut.de | Priv: Michael_at_Reifenberger.com
       http://www.plaut.de           |       http://www.Reifenberger.com


---------- Forwarded message ----------
Date: Thu, 23 Mar 2006 11:15:52 +0100 (CET)
From: Michael Reifenberger <mike_at_reifenberger.com>
To: FreeBSD Stable <freebsd-stable_at_freebsd.org>
Subject: strange deadlock and magic resurrection with RELENG_6

Hi,
I'm using a recent RELENG_6 under I386/SMP (Athlon X2 4800+).
dmesg output is under http://people.freebsd.org/~mr/dmesg.log.gz

Root is on gmirror volume (2 SATA disks), a backup FS is on graid3
(5 firewire disks). This server acts as an bacula server.

During backup with bacula I discovered an complete system freeze
(no keyboard, nfs, disk...) after the following lines on the screen:
...
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=108916879
ad1: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=116030287
ad1: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=108911183
ad1: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=108378767

Since I could ping the system and after waiting a couple of hours in the
hope the system would would resurrection by itself, I issued an
flood-ping to this machine and voila, after getting the following lines:
...
Limiting icmp ping response from 261 to 200 packets/sec
Limiting icmp ping response from 283 to 200 packets/sec
...

Anything went back to normality!

This seems to me that we have an deadlock condition somewhere in the kernel.
But how to debug this issue when anything is frozen?

BTW: I've got the DMA errors in the past allready which seems to be an 
interaction between ata and some geom modules. See a former post from me 
regarding this issue.
Maybe the same issue got fatal now after the latest gmirror/graid3 changes?

Has anyone else seen this?

Bye/2
---
Michael Reifenberger, Business Development Manager SAP-Basis, Plaut Consulting
Comp: Michael.Reifenberger_at_plaut.de | Priv: Michael_at_Reifenberger.com
       http://www.plaut.de           |       http://www.Reifenberger.com
Received on Sun Mar 26 2006 - 06:43:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:54 UTC