Recent ATA drivers giving problems with SATA

From: Joe Marcus Clarke <marcus_at_marcuscom.com>
Date: Wed, 26 Nov 2003 13:15:07 -0500
About a month ago, I bought a new SATA controller and a 160 GB Seagate
SATA drive for my -CURRENT machine.  All was working fine until about a
week ago.  Then, the drive started experiencing hard, unrecoverable DMA
errors.  I RMA'd the drive, then bought a new Maxtor 80 GB SATA drive
(just yesterday).  I started a buildworld on this drive, and it
religiously fails about half-way through all the time (never at exactly
the same place twice, however).  The kernels I had when the failures
occurred were:

FreeBSD fugu.marcuscom.com 5.2-BETA FreeBSD 5.2-BETA #0: Mon Nov 24
23:14:49 EST 2003    
gnome_at_fugu.marcuscom.com:/space/obj/usr/src/sys/FUGU  i386

FreeBSD fugu.marcuscom.com 5.1-CURRENT FreeBSD 5.1-CURRENT #0: Mon Nov
17 21:23:07 EST 2003    
gnome_at_fugu.marcuscom.com:/space/obj/usr/src/sys/FUGU  i386

Kernels before that did not experience the problem.  The buildworld
fails with an Input/Output error, then I see the following on the
console:

Nov 26 02:35:12 fugu kernel: ad4: WARNING - WRITE_DMA recovered from
missing interrupt
Nov 26 02:35:12 fugu kernel: ad4: FAILURE - WRITE_DMA
status=ff<BUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> error=0
Nov 26 02:35:22 fugu kernel: ad4: WARNING - READ_DMA recovered from
missing interrupt
Nov 26 02:35:22 fugu kernel: ad4: FAILURE - READ_DMA
status=ff<BUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> error=0
...
Nov 26 04:37:24 fugu kernel: ad4: timeout sending command=ca
Nov 26 04:37:24 fugu kernel: ad4: error issuing DMA command

At this point, the machine is unusable, and the above two lines scroll
by continuously until the machine is rebooted.  Here are the dmesg
specifics for the controller and drive:

atapci1: <SiI 3112 SATA150 controller> port
0x14b0-0x14bf,0x14c0-0x14c3,0x14c8-0x14cf,0x14c4-0x14c7,0x14d0-0x14d7
mem 0xe800a000-0xe800a1ff irq 9 at device 16.0 on pci0
GEOM: create disk ad4 dp=0xc5246460
ad4: 78167MB <Maxtor 6Y080M0> [158816/16/63] at ata2-master UDMA133

Nothing else was changed in the machine except the specific version of
-CURRENT since the time things worked and now.  In addition to replacing
the drive, I have replaced the SATA cable as well.  My plan is to revert
the ATA drivers to two weeks ago, and see if the problem persists. 
Failing that, I will test to see if this is a cooling problem.  Failing
that, I will replace the SATA controller.  However, I wanted to know if
I'm barking up the wrong tree, and perhaps this is a software issue. 
Thanks.

Joe

-- 
PGP Key : http://www.marcuscom.com/pgp.asc



Received on Wed Nov 26 2003 - 09:15:10 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:31 UTC