Re: [kern/68351] bge0 watchdog timeout on 5.2.1 and -current, 5.1 is ok

From: Vadim Mikhailov <freebsd-bugs_at_mikhailov.org>
Date: Mon, 28 Jun 2004 10:32:00 -0700
Hi,

I have a Dell PowerEdge 1750 server with 2 Xeon 3.0 GHZ CPUs, 4 GB RAM and 2
onboard gigabit ethernet ports:

bge0: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2002> mem
0xfcd20000-0xfcd2ffff,0xfcd30000-0xfcd3ffff irq 17 at device 0.0 on pci2
bge1: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2002> mem
0xfcd00000-0xfcd0ffff,0xfcd10000-0xfcd1ffff irq 18 at device 0.1 on pci2
      
Only bge0 is used, with jumbo frames (my gigabit switch PowerConnect 5224
supports them):

bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 9000
    options=1b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING>
    inet 172.xx.xx.xx netmask 0xfffff800 broadcast 172.xx.xx.255
    ether 00:06:5b:ef:63:e6
    media: Ethernet autoselect (1000baseTX <full-duplex>)
    status: active

This box has two dualport SCSI adapters:

mpt0: <LSILogic 1030 Ultra4 Adapter> port 0xbc00-0xbcff mem
0xfcb20000-0xfcb2ffff,0xfcb30000-0xfcb3ffff irq 13 at device 5.0 on pci4
mpt1: <LSILogic 1030 Ultra4 Adapter> port 0xb800-0xb8ff mem
0xfcb00000-0xfcb0ffff,0xfcb10000-0xfcb1ffff irq 16 at device 5.1 on pci4
ahc0: <Adaptec 3960D Ultra160 SCSI adapter> port 0xdc00-0xdcff mem
0xfcf01000-0xfcf01fff irq 19 at device 4.0 on pci1
ahc1: <Adaptec 3960D Ultra160 SCSI adapter> port 0xd800-0xd8ff mem
0xfcf00000-0xfcf00fff irq 20 at device 4.1 on pci1

Each adapter has disks attached to them. Firmware on motherboard and all
peripherial
devices is upgraded to the very latest versions from Dell.
This setup works more or less ok under FreeBSD 5.1-RELEASE-p8 (GENERIC
kernel with SMP enabled),
but once a month or two machine reboots under load, so I want to upgrade it
to 5.2.1-RELEASE.
But when I boot 5.2.1-RELEASE or later kernel (-current) on this box,
network adapter locks up.
I see these messages on console and in the logs:

Jun 25 15:25:22 vortex kernel: bge0: watchdog timeout -- resetting
						   
If I do "ifconfig bge0 down up", network becomes available for few seconds
and then
machine is not pingable again. I ran "systat -v" and have noticed that ping
stops
working exactly when I see any interrupt coming to mpt or ahc (i.e. on any
disk activity).
						   
One visible difference between 5.1 (where it works) and 5.2.1/current (where
it doesn't)
is that interrupts to PCI devices are getting assigned differently:

IRQ map under 5.1: mpt0 13, mpt1 16, bge0 17, bge0 18, ahc0 19, ahc1 20,
  and under 5.2.1: mpt0 18, mpt1 19, bge0 16, bge1 17, ahc0 20, ahc1 21.

I have tried to change IRQ assignment to PCI devices in the BIOS, but it
didn't change
anything from FreeBSD point of view. I have also tried to boot 5.2.1 with
ACPI disabled -
result is the same. Disabling jumbo frames does not seem to have any effect
either.
Also I tried this on another identical 1750 box (I have few of them) - same
result.
It works fine under Linux kernel 2.4.18.

  I there any way I can track this down? I can provide more information
(verbose boot logs etc) if needed...
All this information has also been filed in this bug report:
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/68351

Thanks!

--
Vadim Mikhailov
Received on Mon Jun 28 2004 - 15:32:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:59 UTC