ATA + DMA still giving repeatable freezes

From: Kirk Strauser <kirk_at_strauser.com>
Date: Thu, 11 Dec 2003 22:48:33 -0600
I built world after cvsup'ing -CURRENT this morning and am still having the
same ATA READ_DMA hangs that started in early October on my system.  I can
repeat the hangs at will; the machine serves as an Amanda server, and
launching a backup for itself plus 3 client machines is guaranteed to
trigger it:

    ad0: TIMEOUT - READ_DMA retrying (2 retries left)
    ata0: resetting devices ..
    ad0: FAILURE - already active DMA on this device
    ad0: setting up DMA failed

When this happens, the system is effectively dead until I reset it.  I can
run for days on end by booting with DMA disabled, but that's not really my
ideal long-term solution as it slows the system to a crawl.

The drive in question is a Western Digital WD1200JB-00DUA3 (Caviar 120GB
special edition) attached to an Asus P3V4X (Via chipset) motherboard.  The
combination has worked perfectly from the server's 4.8-STABLE days, through
5.0, and up until the last two months when I started experiencing this
immediately after an upgrade.

Kernel config is essentially "GENERIC" with the older CPU types and WITNESS*
and INVARIANT* options commented out, and with the SYS-V IPC settings
recommended by PostgreSQL added.  Build flags are very conservative:
"CFLAGS= -O -pipe".

sysutils/smartctl reports:

    SMART overall-health self-assessment test result: PASSED

Basically, I'm about 99% sure that this hardware is OK.  It worked right up
to a big ATAng commit, then stopped working right immediately afterward.
Does anybody have any suggestions of how I can run my machine in UDMA33/66
mode for more than a couple of hours without freezing?

Below is the dmesg.  I didn't want to stick it in the middle of my post:



Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 5.2-CURRENT #1: Thu Dec 11 14:13:32 CST 2003
    root_at_kanga.honeypot.net:/usr/obj/usr/src/sys/KANGA
Preloaded elf kernel "/boot/kernel/kernel" at 0xc0a7d000.
Preloaded elf module "/boot/kernel/linprocfs.ko" at 0xc0a7d1f4.
Preloaded elf module "/boot/kernel/linux.ko" at 0xc0a7d2a4.
Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0a7d350.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel Pentium III (936.74-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x683  Stepping = 3
  Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
real memory  = 805289984 (767 MB)
avail memory = 772505600 (736 MB)
Pentium Pro MTRR support enabled
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <ASUS   P3V_4X  > on motherboard
pcibios: BIOS version 2.10
Using $PIR table, 8 entries at 0xc00f0e60
acpi0: Power Button (fixed)
Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0
acpi_cpu0: <CPU> on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib0: slot 4 INTD is routed to irq 9
pcib0: slot 9 INTA is routed to irq 9
pcib0: slot 10 INTA is routed to irq 9
pcib0: slot 11 INTA is routed to irq 10
pcib0: slot 12 INTA is routed to irq 11
agp0: <VIA 82C691 (Apollo Pro) host to PCI bridge> mem 0xe4000000-0xe7ffffff at device 0.0 on pci0
pcib1: <PCI-PCI bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
isab0: <PCI-ISA bridge> at device 4.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <VIA 82C596B UDMA66 controller> port 0xd800-0xd80f at device 4.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata0: [MPSAFE]
ata1: at 0x170 irq 15 on atapci0
ata1: [MPSAFE]
uhci0: <VIA 83C572 USB controller> port 0xd400-0xd41f irq 9 at device 4.2 on pci0
usb0: <VIA 83C572 USB controller> on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
ulpt0: HewLett Packard HP LaserJet 1200, rev 1.10/1.00, addr 2, iclass 7/1
ulpt0: using bi-directional mode
ukbd0: Belkin Components USB-PS2 Adapter, rev 1.10/1.20, addr 3, iclass 3/1
kbd0 at ukbd0
ums0: Belkin Components USB-PS2 Adapter, rev 1.10/1.20, addr 3, iclass 3/1
ums0: 5 buttons and Z dir.
viapropm0: SMBus I/O base at 0xe800
viapropm0: <VIA VT82C596A Power Management Unit> port 0xe800-0xe80f at device 4.3 on pci0
viapropm0: SMBus revision code 0x0
smbus0: <System Management Bus> on viapropm0
smb0: <SMBus generic I/O> on smbus0
fxp0: <Intel 82559 Pro/100 Ethernet> port 0xd000-0xd03f mem 0xd6800000-0xd68fffff,0xd7000000-0xd7000fff irq 9 at device 9.0 on pci0
fxp0: Ethernet address 00:d0:b7:0e:3a:4a
miibus0: <MII bus> on fxp0
inphy0: <i82555 10/100 media interface> on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp1: <Intel 82559 Pro/100 Ethernet> port 0xb800-0xb83f mem 0xd5800000-0xd58fffff,0xd6000000-0xd6000fff irq 9 at device 10.0 on pci0
fxp1: Ethernet address 00:d0:b7:9e:bb:dd
miibus1: <MII bus> on fxp1
inphy1: <i82555 10/100 media interface> on miibus1
inphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
sym0: <875> port 0xb400-0xb4ff mem 0xd4800000-0xd4800fff,0xd5000000-0xd50000ff irq 10 at device 11.0 on pci0
sym0: Tekram NVRAM, ID 7, Fast-20, SE, parity checking
pci0: <display, VGA> at device 12.0 (no driver attached)
fdc0: <Enhanced floppy controller (i82077, NE72065 or clone)> port 0x3f7,0x3f2-0x3f5 irq 6 drq 2 on acpi0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
ppc0 port 0x778-0x77b,0x378-0x37f irq 7 drq 3 on acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/9 bytes threshold
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
sio0 port 0x3f8-0x3ff irq 4 on acpi0
sio0: type 16550A
sio1 port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
orm0: <Option ROMs> at iomem 0xd4000-0xd4fff,0xd0000-0xd0fff,0xcc000-0xcffff,0xc0000-0xcafff on isa0
pmtimer0 on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio2 at port 0x3e8-0x3ef irq 5 on isa0
sio2: type 16450
sio3: configured irq 9 not in bitmap of probed irqs 0
sio3: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 936743135 Hz quality 800
Timecounters tick every 10.000 msec
acpi_cpu: throttling enabled, 16 steps (100% to 6.2%), currently 100.0%
GEOM: create disk ad0 dp=0xc639f760
ad0: 114473MB <WDC WD1200JB-00DUA3> [232581/16/63] at ata0-master UDMA66
Waiting 15 seconds for SCSI devices to settle
(probe1:sym0:0:1:0): phase change 6-7 6_at_002fe78c resid=4.
(probe2:sym0:0:2:0): phase change 6-2 6_at_0035f98c resid=5.
sa0 at sym0 bus 0 target 6 lun 0
sa0: <SEAGATE DAT    9SP40-000 9100> Removable Sequential Access SCSI-3 device
sa0: 40.000MB/s transfers (20.000MHz, offset 16, 16bit)
GEOM: create disk cd0 dp=0xc6394600
GEOM: create disk da0 dp=0xc63aec50
(cd0:sym0:0:2:0): phase change 6-2 6_at_0035f98c resid=5.
cd0 at sym0 bus 0 target 2 lun 0
cd0: <RICOH RO-1420C 1.61> Removable CD-ROM SCSI-2 device
cd0: 3.300MB/s transfers
cd0: Attempt to query device size failed: NOT READY, Medium not present
da0 at sym0 bus 0 target 1 lun 0
da0: <IOMEGA ZIP 100 J.03> Removable Direct Access SCSI-2 device
da0: 3.300MB/s transfers
da0: Attempt to query device size failed: NOT READY, Medium not present
(cd0:sym0:0:2:0): phase change 6-2 6_at_0035f98c resid=5.
(cd0:sym0:0:2:0): phase change 6-2 6_at_0035f98c resid=5.
(cd0:sym0:0:2:0): phase change 6-2 6_at_0035f98c resid=5.
(cd0:sym0:0:2:0): phase change 6-2 6_at_0035f98c resid=5.
(da0:sym0:0:1:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0
(da0:sym0:0:1:0): CAM Status: SCSI Status Error
(da0:sym0:0:1:0): SCSI Status: Check Condition
(da0:sym0:0:1:0): NOT READY asc:3a,0
(da0:sym0:0:1:0): Medium not present
(da0:sym0:0:1:0): Unretryable error
Opened disk da0 -> 6
(da0:sym0:0:1:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0
(da0:sym0:0:1:0): CAM Status: SCSI Status Error
(da0:sym0:0:1:0): SCSI Status: Check Condition
(da0:sym0:0:1:0): NOT READY asc:3a,0
(da0:sym0:0:1:0): Medium not present
(da0:sym0:0:1:0): Unretryable error
Opened disk da0 -> 6
Mounting root from ufs:/dev/ad0s1a
WARNING: / was not properly dismounted
WARNING: /home was not properly dismounted
/home: mount pending error: blocks 4 files 1
/home: superblock summary recomputed
WARNING: /tmp was not properly dismounted
WARNING: /usr was not properly dismounted
/usr: mount pending error: blocks 24 files 2
WARNING: /usr/export was not properly dismounted
WARNING: /usr/share was not properly dismounted
WARNING: /var was not properly dismounted
/var: mount pending error: blocks 360 files 7
/var: superblock summary recomputed
WARNING: /var/amanda was not properly dismounted
/var/amanda: superblock summary recomputed
WARNING: /var/jail was not properly dismounted
/var/jail: mount pending error: blocks 5420 files 3
/var/jail: superblock summary recomputed

-- 
Kirk Strauser

"94 outdated ports on the box,
 94 outdated ports.
 Portupgrade one, an hour 'til done,
 82 outdated ports on the box."

Received on Thu Dec 11 2003 - 19:48:51 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:33 UTC