Re: amd64/115126: [nfe] nfe0: watchdog timeout (missed Tx interrupts) -- recovering (UP with SCHED_ULE)

From: Pyun YongHyeon <pyunyh_at_gmail.com>
Date: Wed, 23 Apr 2008 17:22:40 +0900
On Tue, Apr 22, 2008 at 09:28:39AM +0200, Luigi Rizzo wrote:
 > related to this bug, i am seeing similar problems with RELENG_7 and amd64,
 > with an ASUS M2N-VM DVI motherboard
 > http://www.asus.com/products.aspx?modelmenu=1&model=1841&l1=3&l2=101&l3=567&l4=0
 > and an Athlon64-BE2400 dual core CPU .
 > 
 > Under heavy load, e.g. scp-ing a large file over the local network,
 > and at the same time doing a buildkernel or building a port,
 > and with X11 active (using the 'vesa' xorg driver)
 > the network card stalls and doesn't recover - i waited over 10 minutes
 > hoping for the watchdog or some timeout to kick in, the only way
 > to bring the link back up was
 > 
 > 	ifconfig nfe0 down ; ifconfig nfe0 up
 > 	dhclient nfe0
 > 
 > doing only ifconfig down/up or only dhclient did not help, i needed both.
 > 
 > vmstat -i says the network card has irq256 (???) and it is not shared with
 > other devices. Ehci, sound, ohci, ata, and others have low irq numbers
 > (6, 14, 20, 21, 22), some shared, some not.
 > 
 > Changing the bios setting for PnP OS from 'yes' to 'no' or viceversa
 > does not change the situation.
 > 

Your BIOS may have an option for ASF related one for onboard NIC.
Try toggling that option and see how it goes.

 > The stall seems related to the presence of other activity - if i
 > let the bulk scp transfer alone, i get an happy 10-10.5Mbytes/s
 > (over a 100meg link).
 > 
 > When the stall occurs, i see no interrupts (vmstat -i counts
 > for irq256 says the same),
 > Packets are still transmitted and received on the other side, it's
 > the rx side of the card that becomes deaf. I don't see any
 > watchdog timeout or other error messages in /var/log/messages.
 > 
 > Also, enabling polling does not help getting traffic in
 > (with a kernel built with DEVICE_POLLING,
 > doing sysctl kern.polling.enable=1 and "ifconfig nfe0 polling").
 > 
 > So i suspect that for some reason the rx ring becomes confused
 > and does not recover.
 > 

Just vague guess, how about disabling MSI/MSI-X in loader.conf?
(hw.nfe.msi_disable = "1", hw.nfe.msix_disable = "1")
If you are using jumbo frame, try disabling it too.

 > Hope this helps...
 > 

It would be even better if you can post verbosed boot messages
related wiht nfe(4) and PHY driver.

 > cheers
 > luigi

-- 
Regards,
Pyun YongHyeon
Received on Wed Apr 23 2008 - 06:52:15 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:30 UTC