I've recently brought a machine up from 5.3-STABLE to 6-CURRENT. It usually just sits in the corner and runs services, but lately I've come home form work or woken up to find that it is completely unresponsive, and I have to hard reset the machine. It happens at least once a day, and it's becoming more and more frequent. When I look at the console, I always have the same 4 messages before the failure: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2085599 ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2085599 kernel: ad0: FAILURE - WRITE_DMA timed out kernel: g_vfs_done():ad0s1d[WRITE(offset=52772864, length=16384)]error = 5 It seems to me that a sector on the disk might be dead in the ad0s1d slice (/var), but I want to be certain before I take further steps that the behavior I'm experiencing is positively unrelated to the migration to 6-CURRENT. I started poking around /var to see if anything was amiss, and I found that mail messages are being stacked up in /var/spool/clientmqueue, even though nothing should be using the msp queue (I've redirected periodic outputs to logfiles). In the last daily run mailed to root in January, I found records in the submit queue that looked like this: j0EDINHh049826 2489 Fri Jan 14 07:18 MAILER-DAEMON (Deferred: Permission denied) There were nearly 500 of them. Even after redirecting periodic output to logs and clearing out the client mail queue, this continues to happen, and I have a hunch that it may be related to the WRITE_DMA timeouts, as it's the only weird behavior I can see on /var. If anyone can help me shed some light on this, I'd appreciate it. I've had 2 IDE drives die in this machine already, I'm going to be severely depressed if I've killed a third. -ReidReceived on Fri Feb 18 2005 - 14:03:36 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:28 UTC