ATA kablooie

From: Matthew D. Fuller <fullermd_at_over-yonder.net>
Date: Mon, 12 Mar 2007 05:17:07 -0500
I have a box that until Friday night was running a Nov '05 -CURRENT
solidly.  After an upgrade, it started spewing out

kernel: ad4: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=38617823

style warnings at the slightest provocation.  A "find / -xdev -print |
xargs cat >> /dev/null" could bring it about in a second or two; not
uncommonly, the arduous effort of spawning off 'sh' for single-user
mode was enough to put it over the cliff.

The system runs an ataraid RAID-1 across ad4 and ad6; which got the
first errors was pretty luck of the draw on any given boot.  They're
on a Promise TX2200 card:

atapci0: <Promise PDC20571 SATA150 controller> port 0xc000-0xc07f,0xc400-0xc4ff mem 0xeb420000-0xeb420fff,0xeb400000-0xeb41ffff irq 15 at device 13.0 on pci0

The card/drives were tried in 3 very different motherboards, all of
which failed identically.  BIOSen were scoured for "make PCI edgy"
options, which were all turned off (though none exhibited a "enable
bus master" option, as one seemingly-related mail thread ended with).
I tried using the loader variable to force the drives to PIO mode to
jam the brakes on, but it didn't seem to work at all (maybe it doesn't
affect SATA?).  I tried splitting the RAID so it only dealt with one
drive; made no difference.

The -CURRENT build was from identical sources to those currently
sitting on this machine, so I can supply $Id$'s if it'll help.  Sadly,
the system needed to be running, so it's not available for further
experimentation.  It ran flawlessly with that Nov '05 -CURRENT, and is
now running flawlessly on RELENG_6.


-- 
Matthew Fuller     (MF4839)   |  fullermd_at_over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.
Received on Mon Mar 12 2007 - 09:17:08 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:06 UTC