Re: 5.3-RELEASE: WARNING - WRITE_DMA interrupt timout

From: Zoltan Frombach <tssajo_at_hotmail.com>
Date: Fri, 19 Nov 2004 01:52:27 -0800
My problem is not related to a SATA controller. I use the onboard UDMA133 
controller (pretty rare) with a Maxtor UDMA133 drive. It is a new ABIT 
motherboard that uses SiS chipset. The hard drive is not new, but previously 
I used it in UDMA100 mode only, with another motherboard. See:

atapci0: <SiS 964 UDMA133 controller> port 
0x4000-0x400f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 2.5 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
ad0: 78167MB <Maxtor 6Y080L0/YAR41VW0> [158816/16/63] at ata0-master UDMA133

Everything works pretty well on this server. Except that these DMA_WRITE 
warning messages make me worrying. However, I was not getting too many of 
them lately, and none since I installed Soren's patch a few hours ago.

I also figured out why my system became so unresponsive at times. I host 
about 150 domains on this server, with email and everything. I use qmail as 
the MTA, and by default it accepts all email on all hosted domains, even 
when the mail is addressed to a non-existing user. It will try to bounce 
those messages but only later in the process. IMO, it is very poor design of 
the qmail MTA, an otherwise pretty powerful email program. I also use 
qmail-scanner with clamav and spamassassin. The qmail-scanner program and 
spamassassin are written in Perl. So every single message that qmail accepts 
gets through qmail-scanner (and therefore gets through clamav and 
spamassassin as well), even the ones that are addressed to non-existing 
users... Some of the hosted domains at times get hit really hard with 
extensive spam and around that time the server becomes very unresponsive. 
Not surprisingly though, because according to my maillog, time to time some 
spammer send literally hundreds of junk mail to non-existing users, all 
within a few seconds of time. Right then the server comes to a crawl. Last 
time, I couldn't access any hosted web sites via HTTP nor FTP for minutes. 
It took me like 3 minutes to be able to get in via SSH because of the 
slowness. Finally I was able to see the reason: all those Perl processes 
scanning the junk mail... The server became a victim of a DOS attack caused 
by excessive spam. So I believe that this was the reason of the 
unresponsiveness. And it could be the reason why I received those DMA_WRITE 
warnings at those times! I'm not a 100% sure about it though, but I think it 
is possible.

I'm going to apply a patch to qmail in a few days. That makes qmail to 
reject messages sent to unexsiting users immediately, so they won't need to 
get scanned. This way, I believe, I will greatly reduce the load caused by 
this flood of junk mail. Then hopefully these DMA_WARNING messages will be 
gone, too... We'll see.

Zoltan

> At 7:33 PM -0800 11/18/04, Zoltan Frombach wrote:
>>For your information, I applied this patch just now to my kernel.
>>Sorry about the delay! I will send an update in a few days once I
>>see if those DMA_WRITE warnings are still happening or not.
>
> For those who may have missed my other message, it looks like all
> of my problems were related to a PCI-based SATA controller which
> was added by the store that built my machine.  This card was added
> even though I had selected a motherboard with on-board SATA.
>
> The problem controller was a:   <SiI 3112 SATA150 controller>
> and it has been causing me enough problems that I couldn't get
> through a buildworld to even try the suggested patch.
>
> I have now switched to the on-board: <VIA 6420 SATA150 controller>
> and so far I have not seen any more of these WRITE_DMA messages.
> None.  And I have been pounding the disk pretty hard with a
> variety of work for a few hours now.  So, now there is no point
> in me adding the patch, because I no longer see the message!
>
> It would still be nice if FreeBSD would react better to whatever
> problems this card causes.  I still have this stupid card, and I
> would be happy to mail it off to anyone who might want to debug the
> problems with it.  And if we *can't* fix it, then maybe we should
> just remove support for it.  I have had to rebuild my freebsd
> partitions several times now due to these problems, and certainly
> that wasn't much fun.  Although I guess my problems might also be
> partially due to the Western Digital drive I was using, when it is
> used in combination with this card.
>
> -- 
> Garance Alistair Drosehn            =   gad_at_gilead.netel.rpi.edu
> Senior Systems Programmer           or  gad_at_freebsd.org
> Rensselaer Polytechnic Institute    or  drosih_at_rpi.edu 
Received on Fri Nov 19 2004 - 08:53:03 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:22 UTC